Reviewing the Reviews

In this part, we will be thinking about social consequences of deploying machine learning models. Recall the food reviews dataset from Week 3 Homework. The code for this question loads weights from a pretrained logistic regression model, which classifies reviews as positive or negative, on a bag-of-words representation for each review. It then prints out the weights associated with certain words: [“yummy”, “Indian”, “Mexican”, “Chinese”, “European”, “gross"]. Using this Colab Notebook , please investigate the weights of the words printed out. What do you notice? Try out other words by changing the list of words passed in. Consider trying words from this list; do the weights match your expectations of what they should be like?: ["disgusting", "favourite", "caffeine", "stinks"] You can also download the code here. 3A) How will a classifier with these weights handle “This is Indian food” relative to “This is European food”? Why does this happen? Consider the following statistics about the number of positive and negative reviews for each of these words.
+ -
yummy 96 28
Indian 2 1
Mexican 1 1
Chinese 1 2
European 0 1
gross 20 69
3B) When is this desirable behavior? When is it not? What if we're helping Yelp build a restaurant recommendation system? What if we are doing a research project in which we are trying to understand how different cuisines are perceived? 3C) What changes could we make in to achieve something different from what is happening in (a)? 3D) If we made the changes you came up with in (c), how would this affect performance on 3E) Is it important that we used logistic regression in this problem? Or would the lessons we learned apply to other linear classifiers? Food for Thought
In considering the questions above, you can see that machine learning doesn’t end with training the model and showing it has a high accuracy. We need to think about whether the behavior of our models is fair. Here, we were looking at food reviews, and we already started to see evidence of bias. Now what if we were building a classifier to look at a resume and decide whether or not to interview someone? Some companies tried this: Hiring Algorithms “After an audit of the algorithm, the resume screening company found that the algorithm found two factors to be most indicative of job performance: their name was Jared, and whether they played high school lacrosse.”

Discussion Guide