Babble script

"Babble" script

Written by Adam Albright, last modified Feb 16, 2003

This is a very simple, quick and dirty script to generate random "words", given an inventory of possible onsets, nuclei, and codas.

It requires two files to run:

The actual script: Babble.pl
- Note: this script makes use of the perl package Math::Round; if you try to run the script and get an error, you may need to download it from CPAN)
An inventory file, listing possible onsets, nuclei and codas
- Here is one example: Sample.inv
- Here is one based on Hawaiian: Hawaiian.inv
In addition, you will need a Perl distribution

Instructions for use:

Run the script, in whatever way you ordinarily run perl scripts. (e.g., by typing 'perl Babble.pl' from a command prompt in Unix, Windows, or Mac OSX, or using the "run" command in MacPerl)
When the script asks for your file of onsets, nuclei, and codas, enter the name of your inventory file (e.g., 'Sample.inv')
Enter the number of words that you want (an integer)
Enter the maximum number of syllables you want your words to have

By default, the program is biased to produce shorter words more often than longer words. It will attempt to generate monosyllable words 2/3 of the time, disayllabic words 2/3 of the remaining times, etc... To modify this, you need to edit the script.

Enter the name of the file you want to save the output under
- Danger: if you enter the name of a file that already exists, this program will write over the existing file.

Also note: If you enter a number of desired words that's larger than the theoretical maximum—that is, than the possible combinations of onsets * nucs * codas in your inventory file times the maximum number of syllables—the program will never terminate. If the program never says "Done generating novel words", check to make sure that you have requested a reasonable number of words.

Tips:

If you want your words to have some segments represented more often than others, simply enter those segments multiple times in the inventory file. Part of the reason why this program was written in such a crude way was because I wanted to create sample languages with codas in the same proportions as those observed in the lexicon of various languages. To do this, I constructed .inv files by pasting in the codas of all of the existing words in the language (e.g., 1000 lines of codas for a 1000-word lexicon)
If there are tight restrictions of possible rhymes, it may be more efficient to list each rhyme separately under the "Nuclei" section, and leave the "Codas" section blank (e.g., for Chinese)

Problems or questions? email me: albright@mit.edu

Back to Adam's homepage

Statcounter