Speech Group Achievements 2001
Speech Group Achievements 2001
Sponsors
C.J. Lebel Fellowship
Dennis Klatt Memorial Fund
Donald North Memorial Fund
National Institutes of Health (Grants R01-DC00075,
R01-DC01925,
R01-DC02125,
R01-DC02978,
R01-DC03007,
1 R29 DC02525,
T32-DC00038,
and National Science Foundation (SES-9820126)
Academic and Research Staff
Professor Kenneth N. Stevens, Professor Morris Halle, Professor Samuel J. Keyser, Dr. Joseph S. Perkell, Dr. Stefanie Shattuck-Hufnagel, Dr. Marilyn Chen, Dr. Jeung-Yoon Choi, Dr. Mark Tiede, Dr. Reiner Wilhelms-Tricarico, Dr. Lisa Lavoie, Dr. Chao-Yang Lee, Jennell Vick, Majid Zandipour, Ellen Stockmann, Seth Hall.
Visiting Scientists and Research Affiliates
Dr. Takayuki Arai, Department of Electrical and Electronics Engineering, Sophia University, Tokyo, Japan.
Dr. Corine A. Bickley, Voice Services Division, Comverse, Cambridge, Massachusetts, and Fonix Corporation, Lexington, Massachusetts.
Dr. Suzanne E. Boyce, Department of Communication Disorders, University of Cincinnati, Cincinnati, Ohio.
Dr. Carol Y. Espy-Wilson, Department of Electrical Engineering, Boston University, Boston, Massachusetts, and Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland.
Dr. Krishna Govindarajan, SpeechWorks International, Boston, Massachusettts.
Dr. David Gow, Department of Psychology, Salem State College, Salem, Massachusetts, and Department of Neuropsychology, Massachusetts General Hospital, Boston, Massachusetts.
Dr. Frank Guenther, Department of Cognitive and Neural Systems, Boston University, Boston, Massachusetts.
Dr. Helen M. Hanson, Sensimetrics Corporation, Somerville, Massachusetts.
Dr. Andrew Howitt, Department of Biomedical Engineering, Boston University, Boston, Massachusetts.
Dr. Robert E. Hillman, Mass Eye and Ear Infirmary, Boston, Massachusetts.
Dr. Caroline Huang, SpeechWorks International, Boston, Massachusettts.
Aaron Im, Department of Neuropsychology, Massachusetts General Hospital, Boston, Massachusetts.
Dr. Harlan Lane, Department of Psychology, Northeastern University, Boston, Massachusetts.
Dr. Sharon Y. Manuel, Department of Speech Language Pathology & Audiology, Northeastern University, Boston, Massachusetts.
Dr. Melanie Matthies, Department of Communication Disorders, Boston University, Boston, Massachusetts.
Dr. Richard McGowan, Sensimetrics Corporation, Somerville, Massachusetts.
Dr. Rupal Patel, Department of Bio-behavioral Studies, Teachers College Columbia University, New York.
Dr. Alice Turk, Department of Linguistics, University of Edinburgh, Edinburgh,
United Kingdom.
Dr. Nanette Veilleux, Department of Computer Science, Simmons College, Boston, Massachusetts.
Dr. Lorin Wilde, MIT Media Laboratory, Cambridge, Massachusetts.
Jane Wozniak, Speech and Language Pathology, Private Therapy, Massachusetts.
Graduate Students
Ying Cao, Lan Chen, Harold Cheyne, Xuemin Chi, Laura Dilley, Heather Gunter, Annika Karlsson-Imbrie, Chi-Yu Liang, Nicole Marrone, Xiaomin Mou, Ariel Salomon, Jason Smith, Atiwong Suchato, Virgilio Villacorta, Julie Yoo
Undergraduate Students
Gillian Arcand, Priya Banerjee, Eunice Chang, Rena Coen, Christina Curry, Nayeli Dault, Nicholas Fung, Allison Glinka, Emily Hanna, Mun Yuk Ko, Jonathan McEuen, Caitlin Schein, Morgan Sonderegger, Charlene St. Pierre, Wechung Wang, Kirsten Ware, Nadjia Yousif
Technical and Support Staff
Arlene E. Wint
1. Prosodic Influences on Phonetic Variation
In the year just past we have focused our efforts on understanding some of the factors which govern surface phonetic variation in connected speech, as a step toward our long-term goal of formulating a model of human speech production planning at the sound level. To this end we have investigated variations in voice quality with prosodic structure, finding that irregular periodicity in spoken utterances is more likely at the beginnings and ends of prosodic constituents and varies systematically with the size of the constituent. In studies of the effects of prosodic constituent boundaries on timing, we found that phrase-final lengthening is progressive through the phrase-final syllable, increasing from syllable onset and nucleus to reach its greatest magnitude in the consonant coda, at least in some circumstances.
In addition we have studied the special phonological characteristics of function words, finding that such words (e.g. articles, prepositions, conjunctions, etc.) are more likely to be monosyllabic and begin with a vowel or a weak consonant such as /h,w,y/ than are content words such as nouns, verbs and adjectives. Adverbs, which are harder to characterize syntactically, show a phonological pattern intermediate between content and function words. These findings may have implications for understanding why function words are more likely to undergo severe phonetic modification in connected speech.
Finally, we have extended our acoustic-phonetic studies to perception, investigating the effect of prosodic structure on the perceived grouping of syllables into words. We showed that listeners report groupings of spoken full-vowel syllables (e.g. boy friend ship side walk) into two-syllable words in ways which reflect repeated f0 patterns in the stimulus. These results support a model in which listeners use global acoustic patterns to help form candidate linguistic constituents from an acoustic waveform.
2. Model for Lexical Access
2.1 Development of the model
We are developing a model of the process whereby human listeners extract word sequences from running speech. The model assumes that words are represented in memory in terms of sequences of segments each of which is specified as a bundle of distinctive features. In the past year, some of the details of this model have been clarified, particularly with regard to accounting for sources of variability observed in the acoustic representation of particular features.
We recognize that there are two kinds of features: articulator-free features (such as [consonant], [vowel], [continuant]) that are represented in the sound as particular types of landmarks, depending on the feature; and articulator-bound features whose acoustic representation resides in the sound in the vicinity of the landmarks set up by the implementation of the articulator-free features. Each feature has a + or a - value. There is a defining articulatory and acoustic correlate that makes the sound resulting from the + value of that feature perceptually distinct from the sound resulting from the -value of the feature. However, additional articulatory gestures are often recruited in order to "fine-tune" or enhance the perceptual saliency of this contrast. These enhancing gestures may be implemented whenever the feature appears in any context within a word in an utterance, or they may be implemented only in certain contexts (e.g., syllable position). Consequently there may be an array of acoustic cues that a listener can use in order to uncover the value of an underlying feature in an utterance.
In running speech, the array of gestures for a feature in a segment may overlap with gestures from an adjacent segment. This overlap may weaken some of the cues for the feature or even obliterate some cues. Furthermore, in the presence of noise, some cues may be masked, and the listener must focus on those that can be detected in the noise. We have examined a number of examples of word sequences where both enhancement and overlap are present. A general observation is that, in most cases in the absence of noise, there is enough evidence among the various cues to permit a listener to uncover the pattern of distinctive features for the segments in the sequence. There are a number of sequences for which gestural overlap obliterates the defining acoustic correlate for a feature, but only when acoustic evidence remains for an enhancing gesture for the feature.
2.2 Module for consonant place of articulation
We have been developing a place of articulation module which is a component of a knowledge-based speech recognition system. There are similarities in the placement and shaping of the articulator for consonants with a particular place of articulation, independent of whether the consonants are stops, nasals, or fricatives. Therefore, certain cues measured in the vicinity of the acoustic landmarks for the consonants may be used for place classification, independent of manner of production.
The utterances were vowel-consonant-vowel (VCV) sequences spoken in isolation by one female and two male speakers. The two vowels in a given utterance are the same type, and six vowel types were used. For the study of place, the consonant is either a stop, a nasal, a fricative, or an affricate. In addition, the aspirated consonant /h/ is compared with fricatives and affricates in order to assess the capability of differentiating frication and aspiration. The CV boundary was examined in this study; the cues are expected to be more salient in this context.
An initial set of place cues derived from the acoustic theory of speech production of stop consonants was applied to stops, nasals, and fricatives and affricates in the utterances of the present study. For stops, the overall discriminant analysis classification score is 75% correct classification for all vowel types and all speakers; for nasals, the overall correct classification score is 57%; and for fricatives and affricates, the correct classification is 85%.
Attempts were made to modify the descriptors for stops, as well as introducing additional cues based on acoustic theory of speech production of the consonant with a given manner. With modification and additional cues relevant to the manner of production added, the classification was improved significantly. For stops, with 12 cues, the overall classification score reaches 94%; for nasals, with 19 cues, the overall classification reaches 100%; and for fricatives and affricates, with 16 cues, the overall classification reaches 97%.
With additional information such as voicing, vowel frontedness, and gender, the place classification score can be increased still further. The effect of additional information on the overall classification scores was determined for the stop consonants and the fricative and affricate consonants. With voicing and vowel context known, the score increases to close to 100% for stops. The unvoiced stops have lower score than voiced stops, whereas voicing has little effect on the detection of place for the fricatives and affricates. Back vowel context gives a higher score than front vowel context for both stop consonants and fricative and affricate consonants. The female utterances are better classified than the male utterances, irrespective of manner.
Since different combinations of cues give the best classification for consonants of different manner, it is often necessary to determine the manner of the consonant as a requirement for determining place. The stop and nasal manners are detected by the consonantal landmark detector and the nasality module that have been developed previously. This study suggests cues that may clarify landmarks involving the feature [continuant] i.e., stop consonants on one hand and fricatives and affricates on the other. With 12 cues, the overall classification score between stops and fricative or affricate consonant is close to 100%. The consonants with frication were separated perfectly from those with mostly aspiration, /h/, by using 13 cues. Stridency of the fricatives and affricates can also be classified very well with 8 cues. The place of articulation for non-strident consonants and strident consonants was classified perfectly with 16 cues and 3 cues, respectively. Furthermore, it was observed that place for fricatives and affricates is better classified when stridency is determined first.
We are proceeding to expand the place module to include consonants in running speech in syllable-initial, syllable-final and ambisyllabic position in the context of strong and weak syllables.
2.3 Detection of stop consonant voicing
One module in the model of lexical access determines whether an obstruent consonant is voiced or voiceless. Such a module has been developed for stop consonants. A preliminary set of acoustic cues for determining voicing is formulated from knowledge of acoustic theory. The acoustic cues include the fundamental frequency, first formant frequency, and the relative amplitudes of the first harmonic, first formant prominence and third formant prominence. The fundamental frequency in the adjacent vowel is used to gauge the stiffness of the vocal folds. Additional cues are the voice onset time (VOT) from release to the onset of voicing and the voice offset periodicity (VOP) immediately after the consonant closure. Some of the measures are used to estimate the spread of the glottis and are sampled immediately before the closure and after the onset of voicing, and others provide evidence for stiffening or slacking of the vocal folds. VOT and VOP are the most important voicing cues. VOT of unvoiced stop consonants is on average 45 ms greater than that of their voiced counterparts and VOP of voiced stop consonants is on average significantly greater than that of their voiceless counterparts. The fundamental frequency, the change in first harmonic amplitude and the change in the difference between the amplitudes of first and second harmonic are cues that can contribute to voicing identification. The results show that a small set of acoustic cues based on theory of speech production may be reliable in determining voicing.
3. Constraints and Strategies in Speech Production
3.1 Development of MATLAB code implementing a movement controller for biomechnaical tongue models
Various learning architectures have been implemented for learning forward and inverse models of the speech production mechanism to be used for the control of 2D and 3D biomechanical models of the vocal tract. These techniques include generalized radial basis function (GRBF) neural networks, hyperplane radial basis function (HRBF) networks, and an iterative algorithmic method for determining the pseudoinverse of the Jacobian matrix (inverse model) necessary for control of movements planned in acoustic space. Although the GRBF networks originally implemented were sufficient for control of the 2D biomechanical tongue model, the computational load makes simulation of a 3D biomechanical tongue model using GRBF networks problematic. The HRBF networks have thus been implemented in place of GRBF networks to reduce the computer memory requirements and simulation times for the controller simulations. The controller transforms the desired acoustic trajectories (change in formants) into a change in muscle activation command that drives the 2-D vocal tract model using a learned HRBF neural network designed to learn this "inverse model" mapping. To train and test the validity of the model, three HRBF forward models were designed that learn the activation commands to formants, muscle length to formants, and the activation commands to muscle lengths within 1% error. The data needed to train the three forward models were extracted from 1,905 simulations of the 2-D vocal tract model covering the formant space and the muscle length space. Using the three forward models 9,375,000 data points were simulated (13 input dimensions: formants, and muscle lengths, and 10 output dimensions: muscle activation commands), which are used to train the inverse model. One form of the inverse model was able to learn the mapping within 5% error. Another form of the inverse model is being explored that is more biologically plausible and able to simulate the function of the cerebellum. This work is also supported by NIH grant R29 DC02852 to Frank Guenther.
3.2 Gestural timing effects in the 'perfect memory' sequence observed under three rates by electromagnetometry
In a well-known example the /ktm/ sequence in the phrase 'perfect memory' is contrasted between careful (list) and fluent production conditions. In that example, X-ray microbeam data were used to show that although in the fluent case coproduction of the /m/ can mask the acoustic releases of the /k/ and /t/, both stops are nonetheless articulated. The current work uses EMMA data to examine this sequence in greater detail: Eighteen subjects produced the phrase in a carrier context under normal, fast, and clear rate conditions. Results confirm that tongue dorsum and tip movements toward velar and apical closure occur regardless of rate and observable acoustic effect. In addition, while movement amplitudes decreased somewhat as rate increased, little variation in the durations associated with the consonant gestures was observed. Instead, changes in rate primarily affected V to V duration and the relative phasing of the velar and apical closing gestures: the tongue tip maximum (/t/) lagged that of the tongue dorsum (/k/) in clear speech, was aligned with it in normal speech, and for a robust minority of subjects anticipated it in the fast rate condition.
3. 3 Relations between production and perception
Twenty-one subjects have been run in an experiment in which articulatory movements, a signal reflecting contact between the tongue tip and lower incisors, and the acoustic signal are recorded. Utterance material is designed to look for inter-speaker differences in degree of motor equivalence, amount of reduction in fast speech, and tongue tip contact with the lower teeth. Twelve data sets have been completed and analyzed. The same subjects are participating in labeling and discrimination experiments that use synthetic continua to examine cross-speaker differences in the sharpness of phonemic boundaries in acoustic space (16 subjects run). These movement, acoustic and perceptual results will be used to test the hypothesis that speakers who show a) more motor equivalence and b) higher velocity increases and less reduction in fast speech have sharper phoneme boundaries (reflected in steeper transitions in labeling functions and higher discrimination scores). The tongue contact data are being used to test the hypothesis that speakers who show less motor equivalence for the sound "sh" are more likely to use a saturation effect (contact of the tongue tip with the lower incisors for "s") in production of the s-sh distinction.
3.4 Development of facilities and paradigms
Extensive software and hardware development has been accomplished in support of the above-described experimentation. This includes refinement of software for data acquisition and interactive and algorithmic data analysis, as well as 3-dimensional vocal-tract modeling and control modeling. Software development also includes significant progress on signal processing to perturb vowel and semivowel formants in real time for use in sensorimotor adaptation experiments. Hardware development includes the design and construction of an electronic device for generating and gating masking noise under computer control, and continued work on the development of a pressure-measuring palatograph. A number of pilot sessions have been run to develop the paradigms for bite block and sensorimotor adaptation experiments.
3.5 Theoretical developments
A model of the sensorimotor control of speech production has been presented. The model is being implemented as a set of computer simulations. It converts an input sequence of discrete phonemes into quasi-continuous motor commands and a sound output. A key feature of the model is that the goals for speech movements, at least for some kinds of sounds, are regions in auditory-temporal space. The model is designed to have properties that are as faithful as possible to data from speakers - including measures of brain function, speech motor control mechanisms, physiology, anatomy, biomechanics and acoustics. Examples of simulations and actual data from some of these domains are illustrated. The examples demonstrate properties of the model or they are consistent with hypotheses generated from it. Our long-range goal is to implement the model completely and test it exhaustively, in the belief that doing so will significantly advance our understanding of speech motor control.
4. Effects of Hearing Status on Adult Speech Perception
4.1 Sentence intelligibility in postlingually deafened adults who receive cochlear implants
This study examined the intelligibility of sentences spoken by postlingually deafened adults while deaf and after one year of experience with a cochlear implant (CI). Two- hundred amplitude-normalized sentences of four, five, six, seven and eight syllables from the Johns Hopkins Lipreading Corpus spoken by 10 CI users (100 sentences from Pre sessions, 100 from Post sessions) were presented to 11 naove listeners with self-reported normal hearing. Listeners transcribed sentences as they were presented with no opportunity for replay. Results were scored as the percent of words transcribed correctly. Matched pairs t-tests were performed for each speaker, comparing Pre and Post words percent correct, aligning sentence syllable length and listener. Sentence intelligibility significantly improved for the group with varying results among individual speakers. The effects age at deafening on intelligibility are discussed. Intelligibility improvements were similar to those reported for masked word intelligibility in a previous study.
4.2 Development of facilities
Acoustic recording and analysis. We have completed the implementation of new data acquisition software and hardware systems for use in a recently funded set of experiments. The data acquisition system digitizes the speech signal directly in real-time, eliminating the need to make, edit and digitize DAT recordings. Once recordings are completed, the software is used for interactive sound segment boundary labeling and automated data extraction, which is facilitated by storing each utterance in a single, token-specific file. This system has significantly reduced the amount of time required for data acquisition and analysis, which will permit us to report on larger and more numerous data sets than in the past. We have implemented an identical system at the University of Miami for data acquisition in a collaborative experiment as part of the newly funded project. Data and signal files are sent between our laboratories over the Internet. An additional component of the hardware system (mentioned above) is used for feedback-modification experiments. It receives input from a microphone and mixes the input with varying levels of speech-shaped noise. Pre-programmed noise and speech feedback levels are set under computer control; the signals are output either to calibrated headphones (for normal-hearing subjects) or to our lab9s cochlear implant speech processor (for cochlear-implant users).
Perceptual data acquisition. We have created two synthetic acoustic continua of stimuli that range between "a beet" and "a boot" (interpolation between vowels) and between "a said" and "a shed" (interpolation between initial consonants). Each item in the continuum is identified and rated for goodness by our subjects to measure their abilities to perform phonemic categorization.
Publications
Journal Articles, Published
Chen, H. and Stevens, K.N. "An Acoustical Study of the Fricative /s/ in the Speech of Individuals with Dysarthria. J. Speech, Lang., and Hear. Research 44: 1300-1314 (2001).
Gould, J., Lane, H., Perkell, J., Vick, J., Matthies, M., and Zandipour, M. "Changes in the Intelligibility of Postlingually Deaf Adults after Cochlear Implantation". Ear and Hearing 22: 453-60 (2001).
Hanson, H.M., K.N. Stevens, H.-K. Kuo, M.Y. Chen and Slifka, J. "Towards Models of Phonation". J. Phonetics 29, 451-480 (2001).
Keyser, S.J. and Stevens, K.N. "Enhancement Revisited". In M. Kenstowicz (ed.), Ken Hale: A Life in Language. Cambridge MA: MIT Press, 271-291 (2001).
Lane, H., Matthies, M., Perkell, J., Vick, J., and Zandipour, M. "The Effects of Changes in Hearing Status in Cochlear Implant Users on the Acoustic Vowel Space and Coarticulation". Journal of Speech, Language and Hearing Research, 44(3): 552-63 (2001).
Matthies, M., Perrier, P., Perkell, J. and Zandipour, M. Variation in Coarticulation with Changes in Clarity and Rate." J. Speech, Lang. Hear. Res. 44: 340-353 (2001).
Perkell, J., Numa, W., Vick, J., Lane, H., Balkany, T. and Gould, J. "Language-specific, Hearing-related Changes in Vowel Spaces: A Preliminary Study of English- and Spanish-speaking Cochlear Implant Users. Ear and Hearing 22: 461-470 (2001).
Redi, L. and S. Shattuck-Hufnagel, "Variation in the Realization of Glottalization in Normal Speakers". J Phonetics 29: 407-430 (2001).
Shattuck-Hufnagel, S. "Phase-level Phonology in Speech Production Planning. In Horne, M. (Ed.), Prosody: Theory and Experiment. Stockholm: Kluwer (2001).
Stevens, K.N. "The Properties of the Vocal-Tract Walls Help to Shape Several Phonetic Distinctions in Language." In Travaux du Cercle Linguistique de Copenhague, Vol. XXXI: 285-297 (2001).
Vick, J., Lane, H., Perkell, J., Matthies, M., Gould, J., and Zandipour, M. "Speech Perception, Production and Intelligibility Improvements in Vowel-pair Contrasts in Adults who Receive Cochlear Implants. Journal of Speech, Lang. Hear. Res., 44: 1257-68 (2001).
Journal Articles, Accepted for Publication
Beckman, M., J. Hirschberg, and S. Shattuck-Hufnagel. "The Development of the ToBI Transcription System". In Sun-Ah Jun (ed.), Prosodic systems of the world's languages, Oxford University Press (In press).
Stevens, K.N. "Toward a Model for Lexical Access Based on Acoustic Landmarks and Distinctive Features. J. Acoust. Soc. Am. (April, 2002).
Stevens, K.N. and H.M. Hanson. Voice Acoustics. In R.D. Kent (ed.), MIT Encyclopedia of Communication Disorders, Cambridge MA: MIT Press (accepted).
Turk, A.E., and S. Shattuck-Hufnagel. "Word-Boundary-Related Duration Patterns in English." J.Phonetics, Forthcoming.
Journal Articles, Submitted for Publication
Matthies, M., J. Vick, J. Perkell, H. Lane, M. Zandipour, and J. Gould. "Effects of Cochlear Implants on the Speech Production, Perception, and Intelligibility of the Liquids /r/ and /l/", submitted to J. Speech, Lang. Hear. Res.
Kwong, K.W. and K.N. Stevens. "On the Voiced-Voiceless Distinction for Writer/Rider," submitted to J. Phonetics.
Lane, H., M.L. Matthies, J.S. Perkell, J. Vick, and M. Zandipour. "The Effects of Changes in Hearing Status in Cochlear Implant Users on the Acoustic Vowel Space and Coarticulation," submitted to J. Speech. Lang. Hear. Res.
Matthies, M., P. Perrier, J. Perkell, and M. Zandipour. "Variation in Speech Movement Kinematics and Temporal Patterns of Coarticulation with Changes in Clarity and Rate," submitted to J. Speech, Lang. Hear. Res.
Perkell, J., M. Zandipour, J. Vick, M. Matthies, H. Lane, and J. Gould. "Rapid Changes in Speech Production Parameters in Response to a Change in Hearing, " submitted to J. Phonetics.
Perkell, J.S., and F.H. Guenther. "Speech Motor Control: Acoustic Goals, Saturation Effects, Auditory Feedback and Internal Models," submitted to J. Phonetics.
Books/Chapters in Books
Keyser, S.J. and K.N. Stevens. "Enhancement Revisited." In Ken Hale: A Life in Language. Ed. M. Kenstowicz. Cambridge MA: MIT Press, 271-291, 2001.
Meeting Papers, Presented
Vick, J., H. Lane, J. Perkell, M. Zandipour, and M. Matthies. "Sentence Intelligibility in Postlingually Deafened Adults who Receive Cochlear Implants." Paper presented at the 2001 Conference on Implantable Auditory Prostheses, Asilomar, California, August 19-24, 2001.
Meeting Papers, Published
Perkell, J., F. Guenther, H. Lane, M. Matthies, Y. Payan, P. Perrier, J. Vick, R. Wilhelms-Tricarico, and M. Zandipour. "The Sensorimotor Control of Speech Production". Proceedings of the First International Symposium on Measurement, Analysis and Modeling of Human Functions, pp. 359-365, Hokkaido University, Sapporo, Japan, Sept. 21-23, 2001.
Perkell, J., F. Guenther, H. Lane, M. Matthies, J. Vick, and M. Zandipour. "Planning and Auditory Feedback in Speech Production". Proceedings of the 4th International Nijmegen Speech Motor Conference, Nijmegen, The Netherlands, June 13-16, 2001.
Tiede, M.K., J. Perkell, M. Zandipour, and M. Matthies. "Gestural Timing Effects in the 'Perfect Memory' Sequence Observed under Three Rates by Electromagnetometry". J. Acoust. Soc. Am., 110, No. 5, Pt. 2, 2657, (A), 2001.
Theses
Cheyne II, H. A. Estimating Voicing Source Characteristics by Measuring and Modeling the Acceleration of the Skin on the Neck. PhD thesis, Department of Massachusetts Institute of Technology, 2002.
Lee, J.A. An Acoustical Study of the Fricative /sh/ in the Speech of Dysarthric Speakers. S.B. thesis, Massachusetts Institute of Technology, Cambridge MA 2001.
Mou, Xiaomin Detection of Stop Consonant Voicing: Toward a Speaker Independent Model. M.Eng. thesis, Department of Electrical Engineering and Computer Science, MIT, 2001.