LSA.138 | Research Methodologies in Computational Linguistics
Michael Collins and Stuart Shieber
course web site: http://lsa.dlp.mit.edu/Class/138
Computational linguistics is the study of human language using tools and techniques of computer science, with application to the useful computational processing of linguistic data. The field has for decades had two main branches. The engineering branch is concerned with the development and improvement of applications that manipulate (analyze, generate, process) natural language in order to achieve some practical goals. It was instigated by Warren Weaver in 1949 in his proposal to use the then nascent computer technology to perform machine translation via the same computational techniques that led to the decoding of the German Enigma code by Alan Turing and his colleagues. The scientific branch followed Noam Chomsky's use in the late 1950's of computational reasoning to demonstrate the inadequacy of various theories of language, in so doing providing evidence for his transformational grammar theory through developing the fundamentals of the computer science subfield of formal language theory. The two branches have different goals, hence differing modes of evaluating success, and some confusion and even rancor has occurred due to perplexity about the relationship between the two enterprises or how certain research projects fit in to them. We will explore the full range of appropriate research methodologies in computational linguistics, both in the abstract and through selected case studies presented by leading figures in the field.