6.863J/9.611J Natural Language Processing
 
 
Course home
[  Main  ] [  About ] [ Assignments ]
 

Staff
Prof. Robert C. Berwick
berwick@csail.mit.edu
32-D728, x3-8918
Office hours: W 4:30-5:30

Course Support
Lisa Gaumond
lisag@mit.edu
32-D724, 617-324-1543
TA: Rob Speer, 32-226
rspeer@mit.edu; office hrs Tues, 2-5

Course Time & Place
Lectures: M, W 3-4:30 PM
Room: 32-144,  map

Level & Prerequisites
Undergrad/Graduate; 6.034 or permission of instructor

Policies
Textbooks & readings
Grading marks guide
Style guide

Course Description

A laboratory-oriented course in the theory and practice of building computer systems for human language processing, with an emphasis on how human knowledge of language can be integrated into natural language processing.

This subject qualifies as an Artificial Intelligence and Applications concentration subject.

Announcements
• Please fill out Underground Guide course evaluations here. (Link active May 7–16)
• RR #5 will only be discussed in class. There will be no official assignment for it.
• RR #4 due Weds, 4/23
• RR #4 released here. Supporting code: tar.gz format; or .zip format
•[4/9] RR #4 readings here and here.
•[3/31] RR #3 extended until Monday, April 7. Revised reading for Collins parser here.
•[3/30]Extensive final project list here. (Includes full past final projects!)
•[3/19] RR #3 released here. Readings for this are deMarcken, here; and Collins, here.
•[3/14] Notes 3 (formal learning theory & language) posted here.
•[3/14] NO reading and response due Monday, 3/17 (see revised schedule)

•[3/13] Lab 2 has been given 2 extra days for completion
•[3/3] Lab 2 posted here . Download the code.
•[3/3] Lab 1, last part, due MONDAY 3/3
•[2/27] RR#2 discussion next MONDAY 3/3
•[2/20] RR#2 posted
•[2/17] Lab 1, part 2, posted. Whole lab posted here.
•[2/16] Lab 1, part 1, posted (revised 2/17)
•[2/12] Notes 1 posted; Lecture 2 posted
•[2/6] For the first Reading & Response assignment and later labs, if you want to run programs on your own computer, please download & install the course toolkit, NLTK, here.
(Follow the directions on the website. Directions on how to run from Athena are in the assignment handout.)

Weeks 1 & 2: Fun NLP link of the week: Postmodernist paper generator. Try 'writing' a new paper by following this link.

Class days in blue, holidays in green, reg add/drop/final project dates in orange.

February 2008
Sun
Mon
Tue
Wed
Thu
Fri
Sat
1
2
3
4
5 6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 28 29
March 2008
Sun
Mon
Tue
Wed
Thu
Fri
Sat
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 27 28 29
30 31          
April 2008
Sun
Mon
Tue
Wed
Thu
Fri
Sat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30    
             
May 2007
Sun
Mon
Tue
Wed
Thu
Fri
Sat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 29 30 31
Course schedule at a glance
Date
Topic
Slides & Reference Readings
Laboratory/Assignments

2/6
Weds

Introduction: walking the walk, talking the talk
Lecture 1 pdf slides; pdf bw, 4-up
Jurafsky & Martin (JM) ch. 4 pp. 1-8; review ch 2 on finite-state automata/regular expressions if necessary.
NLTK docs, ch. 1-3 or if you already know python, just ch. 3 on words.
• Background Reading (for RR 1): Jurafsky & Martin on ngrams.
• Background Reading (for RR 1): Abney on statistics and language.
Background Reading (for RR 1): Chomsky, Extract on grammaticality, 1955.
Background chapters on NLP from Russell & Norvig, ch. 22.
Reading & response 1 out
(Ngrams; NLTK Python warmup)
NLTK installation here
2/11
Mon
Ngrams; smoothing; Word parsing & transducers
RR1 discussion

Lecture 2 pdf slides; pdf bw 4-up
JM ch. 3; ch 10, pp. 1–7;
Notes on finite-state automata and learning: Notes 1
• Angluin, Induction of k reversible automata
• Berwick & Pilato, Learning syntax by automata induction
Background Reading: Kartunnen, History of two-level morphology, 1996.
Reading & response 1 due MON
2/13
Weds
Word parsing II; complexity issues
Lecture 3 pdf slides; pdf 4-up
• Background Reading (RR 2): Harris, From phoneme to morpheme, 1955.
2/19
Tues
Word parsing complexity; What do childrend do? Part of speech tagging

• Lecture 4 pdf slides; pdf 4-up
Background Reading (RR 2): Saffran, Statistical learning in 8-month-old infants, 1996.
Background reading: Yang, Universal grammar, statistics, or both?, 2004.

2/20
Weds
Part of speech tagging; Finding words by MDL
• Lecture 5 pdf slides; pdf 4-up
2/25
Mon
Parsing & syntax I • Airline delay: Lecture 6 pdf slides; pdf 4-up
• JM, ch. 6
• NLTK docs, words & tagging, ch. 4
Reading & response 2 due MON
2/27
Weds
Airline parsing

• Lecture 6 pdf slides; pdf 4-up
• Russell & Norvig, ch 23.
• JM, ch. 11 & ch. 12

Lab 1 parts 1 and 2 due WEDS
3/3
Mon

RR2 discussion; Parsing II: basic dynamic programming

• Lecture 7 pdf slides; pdf 4-up
• JM ch. 12 new (cf parsing) pdf.

Lab 1 part 3 due MON
Lab
2 out
3/5
Weds
Earley parsing; Probabilistic parsing & Treebanks
• Lecture 8 pdf slides; pdf 4-up
• Lecture 8a ('animation' of probabilistic CKY) here.
• Billot & Lang on 'packed parsed forests' here. (Warning: advanced automata theory required to understand this paper.)



3/10
Mon

Learning syntax I: basic results

• Lecture 9 pdf slides; pdf 4-up
NLTK docs, ch. 8
• Background Reading: Levelt, Grammatical inference
• (brief); Pinker, Formal models of language learning.
• Background Reading: Gold, Language identification in the limit, 1967.

3/12
Weds
Learning syntax II; More on basic results
• Lecture 10 pdf slides; pdf 4-up
NLTK docs, ch.9