LSA.207 | Finite-State Methods in Natural Language Processing

LSA.207 | Finite-State Methods in Natural Language Processing

Lauri Karttunen
MW 4:50-6:30
location: 32-124
course web site: http://lsa.dlp.mit.edu/Class/207

Finite-state descriptions have been used very successfully to describe the phonology, orthography, and morphology of a large number of languages. Many other basic steps in language processing, ranging from tokenization to named-entity recognition and shallow parsing, can be performed efficiently by means of finite-state automata. This course will introduce the students to the finite-state tools and formalisms for expressing rules and constraints.

Much of the success of the finite-state approach to phonology and morphology is based on the observation that most of the phenomena involved in phonological alternations and morpho-syntactic constraints can be described in finite-state terms. For example, classical phonological rewrite rules represent regular relations. The technology for compiling such descriptions into efficient finite-state automata is now very mature and will be demonstrated with the xfst tool that accompanies the recent book Finite State Morphology by Kenneth R. Beesley and Lauri Karttunen (CSLI Publications, 2003).

We will also discuss the formal power of some current theoretical approaches to morphology such as Optimality Theory and Realizational Morphology and the question of whether they are in the finite-state domain. This appears to be the case for Realizational Morphology but not for all versions of OT. Optimality constraints typically represent regular languages; the GEN relation is regular; constraint ranking corresponds to the ordering of rewrite rules. But finite-state models cannot keep track of an unlimited number of constraint violations.