In this problem set, you will get some slightly preprocessed data from Amazon’s Mechanical Turk from an experiment run in the lab in 2012 (now published in Mahowald, Fedorenko, Piantadosi, & Gibson, 2013). (It will be a good idea to save this code to re-use for your own Turk project.)

In this project, Mahowald et al. looked at word pairs like chimp/chimpanzee and math/mathematics. A prediction from information theory is that words that are predictable in context should be shorter. If you already know what the next word will be (as in “To be or not to…”), there is no reason to spend a lot of time and effort saying a long word. On the other hand, if the word is entirely unpredictable (“The next word I am going to say is…”), the word should be longer and thus more robust to noise. That is, if one syllable gets garbled or misunderstood, the meaning can still be recovered. To test whether English is efficient in this regard, they gave subjects on Turk sentences like “Susan loves the apes at the zoo, and she even has a favorite…” (supportive context) or “During a game of charades, Susan was too embarrassed to act like a…” (neutral context) in which they were asked to complete the sentence with either the word chimp or chimpanzee.

Here is a sample trial from the supportive condition:

Susan loves the apes at the zoo, and she even has a favorite...
1. chimp      2. chimpanzee

There were 40 longshort word pairs, and they were presented to the subjects in random order such that each subject saw a different order of sentences. They also varied which form appeared first (the short one or the long one). They predicted that the shorter form would be more common in the predictable construction than in the unpredictable one for a whole list of such word pairs. We want you to look and see if this prediction is correct!


Now that the data are tidy, you can start analyzing and looking for patterns. The item of interest is whether the person picked the long or short form and what the condition was. Give a short quantitative discussion of the results. Make sure to answer the following questions. There is some flexibility in how you answer the question, but be sure to report numbers (no inferential statistics necessary) and be clear about which numbers you are reporting.