LSA.207  FiniteState Methods in Natural Language Processing
Lauri Karttunen
MW 4:506:30
location: 32124
course web site: http://lsa.dlp.mit.edu/Class/207
Finitestate descriptions have been used very successfully to describe the phonology, orthography, and morphology of a large number of languages. Many other basic steps in language processing, ranging from tokenization to namedentity recognition and shallow parsing, can be performed efficiently by means of finitestate automata. This course will introduce the students to the finitestate tools and formalisms for expressing rules and constraints.
Much of the success of the finitestate approach to phonology and morphology is based on the observation that most of the phenomena involved in phonological alternations and morphosyntactic constraints can be described in finitestate terms. For example, classical phonological rewrite rules represent regular relations. The technology for compiling such descriptions into efficient finitestate automata is now very mature and will be demonstrated with the xfst tool that accompanies the recent book Finite State Morphology by Kenneth R. Beesley and Lauri Karttunen (CSLI Publications, 2003).
We will also discuss the formal power of some current theoretical approaches to morphology such as Optimality Theory and Realizational Morphology and the question of whether they are in the finitestate domain. This appears to be the case for Realizational Morphology but not for all versions of OT. Optimality constraints typically represent regular languages; the GEN relation is regular; constraint ranking corresponds to the ordering of rewrite rules. But finitestate models cannot keep track of an unlimited number of constraint violations. 
