Like GFS, this paper uses the term "master", which is outdated language that the community is moving away from (see here and here for examples of alternate terms). We use the word "controller" below in place of "master", just like in GFS.

After reading through Section 3, you should be able to understand and explain Figure 1 (the "Execution overview") in detail (explaining that figure is a great test of your MapReduce knowledge, as you get ready to prepare for a future exam). After reading Sections 5 and 6, you should understand the real-world performance of MapReduce. An example question that you should be able to answer: How do stragglers effect performance?

As you read, think about the following:

  • What happens when a task in MapReduce is taking a long time to complete on a single machine?
  • Suppose a machine M completes a map task, and then M fails. What happens? What is the impact on other machines that need data from the map task?
  • With MapReduce and GFS, Google has made simplifying assumptions that make sense for their work and their workloads. What are the downsides of using these systems as models for reliability?
As always, there are multiple correct answers for each of these questions.

Submit your answers to these questions on Canvas by 12:00pm on Friday 4/10 (note that we're back to a Friday deadline now). You should be writing a few sentences in response to each question (so we don't need you to write an essay for each one, but we're also expecting more than one-word answers). Your responses should be in your own words, not direct quotations from the paper.