Salaam! This section describes research that I have done in speech and language processing.

Speech Recognition for Language Learning.
My MIT Masters thesis [1], supervised by Dr. Stephanie Seneff, described the design and implementation of an intuitive online system for the annotation of non-native Mandarin Chinese speech by native Chinese speakers. This system has the potential to benefit Chinese teachers and speech recognition researchers simultaneously. Chinese classes usually give out spoken reading assignments which are then graded and annotated for correctness by the teachers. My web-based interface for Chinese reading assignments is a simple, integrated solution for completing and correcting of spoken reading assignments. The recorded speech and transcriptions can also serve as a corpus of labeled non-native speech for use in future research.

I tested the system by having several native Chinese speakers annotate a sample bank of 250 Chinese utterances and observed fair to moderate inter-rater agreement scores. In addition to giving a benchmark for inter-rater agreement, this also demonstrates the feasibility of having remote graders annotate sets of utterances.

“Heard Chinese” to Pinyin Translation.
A local attempting to give directions to an individual traveling in a foreign country might not be able to convey street and landmark names even if they know English. However, non-native speakers are likely to mis-hear unfamiliar phonemes. Moreover, many languages, like Chinese, use unfamiliar alphabets. I therefore sought to develop a program that could translate transcriptions of “heard” Chinese by English speakers into the original Chinese phrase for my undergraduate thesis [2].

For this project, supervised by Dr. Tony Eng, I first created a corpus of Chinese speech transcriptions from native English speakers with no Chinese experience. I applied natural language processing techniques to these transcriptions, taking into account the frequencies of different phonemes in Chinese, as well as their frequencies in conjunction with the phonemes adjacent to them to develop an algorithm for translating the transcriptions back to the original Chinese pinyin phrases.

References
[1] Andrea Hawksley, “An Online System for Entering and Annotating non-Native Mandarin Chinese Speech for Language Teaching,” MIT Masters Thesis, August, 2008.

[2] Andrea Hawksley, “Progress Towards a ‘Heard Chinese’ to Pinyin Translator,” MIT Undergraduate Thesis, June, 2007.