CSE 5525: Speech and Language Processing

Fundamentals of natural language processing, automatic speech recognition and speech synthesis; lab projects concentrating on building systems to process written and/or spoken language.

  • (CSE 3521 or CSE 5521) and (CSE 5522 or Stat 3460 or Stat 3470)
  • Math: Basic Probability and Linear Algebra
  • Programming: Python, Numpy/Scipy, Linux/Unix (for windows users: https://www.cygwin.com/)
  • Textbook:

    Participation (5%)

    You will receive credit for asking and answering questions related to the homework on Piazza and engaging in-class discussion.

    Homework (55%)

    The homework will include one written assignment, three programming assignments and one market study on a company that uses speech or language techniques. Students should work independently on homework assignments (except that market research can be done by up to two students). Everyone can have three free late days without penalty for the whole semester. After you have used your free late days, you will lose 20% per day that your assignment is submitted late. Please carefully check your submitted homework - if wrong files are submitted for a homework, you may lose all the credits of that homework.

    Midterm (20%)

    There will be an in-class midterm before spring break.

    Final Projects (20%)

    The final project is an open-ended assignment, with the goal of gaining experience applying the techniques presented in class to real-world datasets. Students should work in groups of 3-4. The final project report should be 2-4 pages. The report should describe the problem you are solving, what data is being used, the proposed technique you are applying in addition to what baseline is used to compare against. Each group will give a project presentation in the last class on April 21st.

  • Piazza (discussion, announcements and restricted resources)
  • Carmen (homework submission and grades)
  • Academic Integrity
    Any assignment or exam that you hand in must be your own work (with the exception of group projects). Talking with others to better understand the material is strongly encouraged. However, copying a solution or letting someone copy your solution is cheating. Code you hand in must be written by you, with the exception of any code provided as part of the assignment. MOSS (Measure of Software Similarity) will be used routinely to detect plagiarism on programming assignments. Any collaboration during an exam is considered cheating. Any student who is caught cheating will be reported to the Committee on Academic Misconduct and will receive a grade F. Do not take a chance - if you are having trouble understanding the material, please take advantage of office hours and know that we will be happy to help.
  • Market Research (10%, group, due 11:59pm, Friday, Feb 3rd, submit in Carmen)
  • Homework 0 Background Review (5%, individual, due 12:45pm, Wednesday, January 18th, hand in at the beginning of class)
  • Homework 1 Text Classification (12%, individual, due 11:59pm, Thursday, March 2nd, submit in Carmen)
  • Homework 2 Sequential Tagging (18%, individual, due 11:59pm, Thursday, April 6th, submit in Carmen)
  • Final Project (20%, group, due 11:59pm, Friday, April 21st, submit in Carmen and present in class -- no late day allowed)
    The final project is an open-ended assignment, with the goal of gaining experience applying the techniques presented in class to real-world datasets. Students should work in groups of 3-4. please sign up for a group and a project. Each group will submit a report together with data/code, and give a 5-minute project presentation (and/or demo) in the last class on April 21st. It is acceptable to do the same work for this project and another class’s project, however, you must make this clear and write down the exact portion of the project that is being counted for CSE 5525.

    For inspiration, here are some example NLP course projects done by students. There are also many NLP shared tasks with existing datasets and sometimes baseline systems at SemEval and WNUT (also check previous years). If you want to get some Twitter data to work with, you can follow the Twitter API tutorial written by the instructor. At minimal, you could learn and implement the word alignment or decoding components in the statistical machine translation system (both are well-structured homework assignments in Machine Translation course).

    Report (2-4 pages, not including references):
  • Title and Author(s)
  • Goal: What are you trying to do? Give an example of inputs/outputs or user interaction. How might it be helpful to people?
  • Method: How are you trying to solve the problem? What are existing approaches to the problem?
  • Experiments: What data do you use? What metric(s) do you use to measure success? What baseline method do you compare against? How well do your methods perform compared with the baseline, and why?
  • Conclusions: What did you learn from your experiments? Suggest future ideas.
  • Replicability: Submit (or include links to) all the code that you wrote and all the data that you used.
  • Related Work and Reference: This is absolutely necessary. You may use any existing code, libraries, etc. and consult any papers, books, online references, etc. for your project. However, you must cite your sources in your writeup and clearly indicate which parts of the project are your contribution and which parts were implemented by others.

  • The rubric for projects is as follows:
  • An A (90%) project shows evidence of a working implementation; the writeup comprehensively analyzes the system's strengths and weaknesses as well as answering all questions in the assignment; some optional or novel feature has been included, or there is interesting speculation about extensions to the basic algorithm.
  • A B (80%) project shows evidence of a working implementation. The writeup is adequate, but lacks a full analysis, and there are no suggestions for extensions or novel features.
  • A C (70%) project shows evidence of a mostly-working implementation, with some attempt to explain what went wrong and why.
  • A D (60%) project does not work at all, or shows a misunderstanding of basic concepts.
  • An E (<60%) project is not turned in on time.

  • In-class Exercises
  • Exercise 1 (solution to a similar example in J+M, 3rd Edition 6.3)
  • Exercise 2 (solution taught in class on 1/25)
  • Exercise 3 (solution in Daume's CIML Chapter 4)
  • Exercise 4 (solution to a similar example in J+M, 3rd Edition 4.1 and 4.4.1)
  • Exercise 5 (solution taught in class on 2/15)
  • Exercise 6 (solution in Ralph Grishman's slides, a similar example in J+M Edition 10.4.3)
  • Exercise 7 (solution in J+M Edition 7.1 and 7.2)
  • Exercise 8 (solution taught in class on 3/24)
  • Exercise 9 (solution taught in class on 3/29)
  • Exercise 10 (solution taught in class on 4/14)
  • Anonymous Feedback
    Schedule (subject to change; slides and readings will be updated as the term progresses)
    Date Topic Required Reading Suggested Reading
    1/11 Course Overview J+M, 2nd Edition Chapter 1
    1/13 Probability Review and Naive Bayes Mackay Book 2.1-2.3 (Probability), J+M, 3rd Edition 6.1 (Naive Bayes) Crane's notes (Bayes' Rule)
    1/13 Guest Speaker: Zhou Yu (CMU) Dreese Labs 480, 3:00pm
    1/18 Text Classification J+M, 3rd Edition 6.2-6.4 (Naive Bayes) Shimodaira's notes (Naive Bayes)
    1/20 Logistic Regression J+M, 3rd Edition 7.1-7.3 (Logistic Regression) Michael Collins' notes (Log-Linear Models)
    1/25 More Logistic Regression J+M, 3rd Edition 7.4-7.5 (Logistic Regression)
    1/27 Multi-class Logistic Regression and Perceptron J+M, 3rd Edition 6.6-6.8 (Multi-class), Daume's CIML 4.1-4.4 (Perceptron Algorithm)
    2/1 More Perceptron Daume's CIML 4.5-4.7 (Perceptron Algorithm)
    2/3 Language Modeling J+M, 3rd Edition 4.1-4.2 (Language Models)
    2/8 More Language Modeling J+M, 3rd Edition 4.3-4.4 (Language Models)
    2/10 Kneser-Ney Smoothing J+M, 3rd Edition 4.4-4.6 (Language Models) Michael Collins' notes (Language Models)
    2/15 Parts of Speech and Hidden Markov Models J+M, 3rd Edition 10.1-10.3 (Part-of-Speech Tagging) and 9.1-9.2 (Hidden Markov Models) Michael Collins' notes (Hidden Markov Models)
    2/17 The Viterbi Algorithm J+M, 3rd Edition 9.3-9.4, 10.4
    2/22 Maximum Entropy Markov Models J+M, 3rd Edition 10.5
    2/24 Log-linear Models J+M, 3rd Edition 7.1-7.5 (Logistic Regression) Michael Collins' notes (Log-Linear Models)
    3/1 More Maximum Entropy Markov Models J+M, 3rd Edition 10.5 Michael Collins' notes (MEMMs)
    3/3 Midterm Review
    3/3 Distinguished Speaker: Raymond J. Mooney (UT Austin) Dreese Labs 480, 3:00pm LSTM for Language and Vision Venugopalan et al.'s paper (Vedio Captioning)
    3/8 Guest Lecture: Jeniya Tabassum Probabilistic Graphical Model with Latent Variables Tabassum et al.'s EMNLP 2016 paper
    3/10 Midterm (in class, closed book)
    3/22 Conditional Random Fields and Structured Perceptron Daume's CIML 17.1-17.3 and 17.6-17.7 (Structured Prediction) Sutton and McCallum's CRF Tutorial
    3/24 Syntax and Context Free Grammars J+M, 3rd Edition Chapter 11 (Formal Grammars)
    3/24 Guest Speaker: Dan Garrette (Google) Dreese Labs 480, 3:00pm Combinatory Categorial Grammars (CCGs)
    3/29 Parsing and CKY Algorithm J+M, 3rd Edition Chapter 12 (Syntactic Parsing) Collins et al.'s paper (Global Linear Model)
    3/31 Automatic Speech Recognition J+M, 2nd Edition Chapter 9 (Automatic Speech Recognition) Juang and Rabiner's Automatic Speech Recognition – A Brief History of the Technology Development, Gales and Young review (HMM-based ASR)
    4/5 Guest Lecture: Micha Elsner Integer Linear Programming (ILP) Clarke and Lapata's paper (Integer Linear Programming)
    4/7 Deep Learning for Speech Recognition Graves et al.'s paper (Connectionist Temporal Classification)
    4/12 More Deep Learning for Speech Recognition Baidu Research's paper (Deep Speech)
    4/14 Neural Network Language Modeling Koehn's 15.1-15.2.3 (Neural Networks) Bengio et al.'s paper (Neural Language Models)
    4/19 Deep Learning for NLP Goldberg's tutorial (Neural Networks in NLP)
    4/21 Project Presentations
    Market Research (10%, due 11:59pm, Friday, Feb 3rd, submit in Carmen)
    This homework can be done by up to two students.
    Step 1: watch some example presentations of company profiling:
  • Example Presentation 1 (Iceland’s Crowdsourced Constitution by Abhishek GAdiraju)
  • Example Presentation 2 (Silk Road by Shreshth Khilani)
  • Example Presentation 3 (Kiva: Loans That Change Lives by Morgan Snyder and Tiffany Lu)
  • Step 2: please sign up for a company and a date for in-class presentation (play your recorded video presentation and answer questions). Please do not pick a company that another student has already signed up for.
    Step 3: create a 5-minute video presentation about the company. To turn in your video, please upload it to Vimeo following the instructions, then submit the link to your video in Carmen.
    Step 4: if you would like to give other students feedbacks, you may use the peer grading function in Carmen following the instructions.

    Here is an incomplete list of companies: 3M Cogent, Apple, AI2, Amazon, App Orchid, Aylien, Baidu, BirdEye, Bloomberg, Bosch, Digital Reasoning, Duolingo, MetaMind, ETS, Expect Labs, Facebook, FiscalNote, Genee, Google, Grammarly, Huawei, IBM, Klevu, KITT.AI, Lexalytics, Lattice, Lilt, NetBase, Newsela, NetBase, New York Times, Microsoft, Narrative Science, Nuance, Safaba, SwiftKey, SyTrue, SRI, Thomson Reuters, Twitter, Veritone, Verizon, Vphrase, Yahoo!, Yelp. You can find many others.

    Below are some questions you may consider to address in your presentation (and you should cite sources in your presentation, for example, list the URLs of online resources you used):
  • When was the company started?
  • Who were the founders?
  • What kind of organization is it? (publicly traded company, privately held company, non-profit organization, other)
  • What is company's main business model?
  • How does the company generate revenue?
  • What is the finance situation of the company? (stock price, annual report, news)
  • Why is the company interested in speech or NLP technologies or both?
  • What are specific areas or applications of speech/NLP the company is interested in?
  • What products of the company use speech or NLP technologies?
  • What the main users of their speech or NLP technologies?
  • Does the company hold any patent using speech or NLP technologies?
  • Does the company publish any papers on speech or NLP technologies?
  • Is the company recently hiring in NLP? interns? phd?
  • What specific expertises within speech or NLP the company is looking to hire?
  • How is the press coverage of the company?
  • How many employees do the company have?
  • An estimation of how many speech/NLP experts currently in the company?
  • Any notable speech/NLP researcher or recent hires in the company?
  • Which city is the company's speech/NLP research office located?