Problem set 4

In this assignment, you will analyze data from an experiment that was run on Amazon Mechanical Turk. In this experiment we tested how and whether the placement of a verb particle (i.e. the position of a particle preposition relative to a verb, for verbs like “put up”, “throw out”, …) depends on the length (in number of words) of the direct object of the verb. We show participants four kinds of sentences, in which the particle either comes early or late, and the direct object is either long or short. This is called a 2x2 design. The independent variables are: Particle position (early or late), and object length (long or short).

Example Sentences:

Joe threw the documents out. (late-short)
Joe threw the very important documents that he brought home out. (late-long)
Joe threw out the documents. (early-short)
Joe threw out the very important documents that he brought home. (early-long)

Load particle_data.csv into R. Exclude all participants from the analysis a) whose home country is not USA (Answer.country), b) whose native language is not English (Answer.English), and c) who did not answer at least 90% of the item comprehension questions correctly (Correct). Also, throw out any individual data points (not the whole participants!) that have NA for Answer.Rating. Report the number of remaining rows in the data frame after excluding these participants and data points.
The column Condition simultaneously encodes the two independent variables. It would be better to have one independent variable per column. Use separate() to split the column Condition into two columns based on the position of the - character.
Transform the grammaticality ratings (Answer.Rating) into z-scores with means and standard deviations estimated within subjects.
Make a plot with the means for each condition and their 95% confidence intervals. Map one independent variable to the x axis and a color, and split the data by the other independent variable using facet_wrap(). What visual impression do you get from this plot about the differences among the conditions in this experiment?
Define two dummy-coded predictors based on the independent variables (early vs. late, long vs. short). If you fit a linear regression with these predictors and their interaction, what will the coefficient of the intercept fit? What will each variable’s coefficient fit? What will the interaction term coefficient fit?
Fit a linear regression (using lm()) to the data predicting z-scored judgments from the dummy coded predictors based on the independent variables and their interaction. Use the summary function to get the model output. Briefly describe the result.
Compare the coefficient fits in the model to the predictions about them that you made in question 5 to verify that you’re interpreting the model correctly.
Extra Credit: Change the indepepent variable predictors to be effects coded rather than dummy coded. Fit the same model as before to this dataset and briefly describe the result. Do any terms change from being significant to being non-significant, or vice versa?
Extra Credit: In this new model, what does each of terms (intercept, slopes, interaction) fit? Compare each term to the correponding quantities in the data to verify your interpretation.
Use the coefficients from the model (either of the two you’ve fit) to calculate the predicted group means for the four cells of the design. How far off are they from the actual group means?
Using the analyses you’ve done, write a few sentences summarizing the results of the experiment.