MIT Speech Communication Group: History

History of the Speech Communication Group

1957: Instructor George W. Hughes (left) and Professor Morris Halle measure the power spectrum of a speech sound recorded on a loop of tape.(Photo by Benjamin Diver. Source: RLE photo archive)

The Radiation Laboratory was established at MIT in 1940, fourteen months before the United States entered World War II, with the goal of developing practical airborne radar systems. The "RadLab" designed almost half of the radar deployed in World War II, and, at the peak of its activities, employed nearly 4000 people working on several continents. The laboratory was officially closed on December 31, 1945, but several members of the Basic Research Laboratory continued at MIT. In July 1946, the members of the Basic Research Laboratory formed the new Research Laboratory of Electronics. (Source: RLE History page)

Professor Kenneth N. Stevens received his Master's degree from the University of Toronto, before coming to work in the MIT Acoustic Laboratory, under the direction of Leo Beranek. His thesis, a study of human perception of sounds produced by resonant circuits, led to further research into perception and production of vowel sounds in syllabic context, mostly in collaboration with Arthur House. After spending a few years on problems of noise control and a study of community reactions to airport noise, in collaboration with researchers from Bolt, Beranek, and Newman, he was hired as a full-time assistant professor in the Acoustics Laboratory of the RLE in 1954. In his years in the Speech Communication Group, he has participated in developing the acoustic theory of speech production, and in particular, has worked to relate quantal phonological representations to the apparently continuous acoustic and articulatory patterns of speech.

In 1949 or 1950, Gunnar Fant visited MIT for the first time from the Royal Institute of Technology in Sweden. At the time, speech synthesizers were made of variable resistors, capacitors, and op-amps, hooked together in long chains to represent the transmission line characteristics of the vocal tract; such a synthesizer at the MIT Acoustics Lab was called DAVO (dynamic analog of the vocal tract). Professor Fant suggested a simpler structure called "formant synthesis," which was soon developed at MIT in the form of a circuit called POVO. In the 1970s, Dennis Klatt and Jonathan Allen developed one of the first digital formant synthesizers. The original Klatt synthesizer has been incorporated over the years into several distinct research tools and commercial products, including the DECtalk program sold by the Digital Equipment Corporation, and the HLSyn program sold by Sensimetrics.

Senior Research Scientist Dr. Joseph Perkell received his D.M.D. from the Harvard School of Dental Medicine in 1967. Two years before his Harvard graduation, he came to the Speech Communication Group, where he went to work tracing outlines of the tongue and jaw from x-ray motion pictures. In 1974, he wrote up a description of a digital physiologically-oriented lumped-mass model of the tongue, which became his Ph.D. thesis at MIT. Dr. Perkell and his colleagues have developed a system for electromagnetic midsagittal articulometry called EMMA, which they use to characterize the various articulator motions under study. Dr. Perkell also collaborates with other scientists to study the influence of auditory feedback on speech production.

Research Scientist Dr. Stefanie Shattuck-Hufnagel received her Ph.D. in Psychology from MIT in 1975. Known initially for her work with tongue twisters and spoonerisms ("You have hissed all my mystery lectures, and tasted the whole worm!"), Dr. Shattuck-Hufnagel investigates the cognitive structures and processes involved in speech production planning, particularly at the level of speech sound sequencing. Her work with speech error patterns and with the acoustic analyses of prosody has implications for cognitive models of speech production and for phonological theory, as well as applications in speech recognition and synthesis

MIT first became formally involved in a speech recognition project when Professor Dennis Klatt became one of the reviewers in the ARPA speech understanding project, from 1973 to 1976. At the time, many speech scientists didn't believe that the acoustic signal contained enough information to identify individual phonemes, without also identifying the words containing them. Victor Zue put such fears to rest in 1980 and 1981 by training himself to read phonemes from a spectrogram. In the late 1980s, along with other researchers, Zue developed his knowledge into a speech recognition system called SUMMIT. The recognition system, and the research projects which developed from it, were so successful that they spawned a new laboratory at MIT, the Spoken Language Systems Group in the Laboratory of Computer Science.