6.033 - Computer System Engineering MapReduce Assignment

Hands-on 4: MapReduce

Complete the following hands-on assignment. Do the activities described, and submit your solutions via Gradescope by 11:59p.

This assignment asks you to write a simple parallel program with the MapReduce library using a single-machine python implementation.

I. Warmup

Download the following files: mapreduce.py, reverseindex.py, and kjv12.txt. Then, run mapreduce:

  > python mapreduce.py kjv12.txt 
After running for a little while, the output should be as follows:
  and 12846
  i 8854
  god 4114
  israel 2574
  the 1843
  for 1743
  but 1558
  then 1374
  lord 1071
  o 1066
  david 1064
  jesus 978
  moses 847
  judah 816
  jerusalem 814
  he 754
  now 643
  so 622
  egypt 611
  behold 596

The output has two columns: the first column has a lower-case version of a title-cased word that appears in the ASCII bible and the second column has a count of the number of times that word appears in the bible. The output is trimmed to only display the top 20 results sorted by descending word count.

II. Studying mapreduce.py

We will now study mapreduce.py. The program begins execution after the following statement:

  if __name__ == '__main__':

We then create an instance of the WordCount class using a few parameters. The last parameter comes from the command line. In our example, it is kjv12.txt. This parameter controls which file we will be executing MapReduce on. Immediately after initialization, we call run on the WordCount instance. When we call run on our WordCount instance, the Python MapReduce library runs the MapReduce algorithm using the map and reduce methods defined in the WordCount class.

You may find the Python Reference useful in answering the following questions. In particular, the sections on Multiprocessing and Process Pools may be useful.

III. Questions

Now you're ready for this week's questions.

Like before, the questions are in a read-only google doc. Make sure to enter quesitons in the page indicated (please do not erase the question text) and upload them as a PDF to Gradescope. See more detailed instructions at the end of the first week's hands-on. If you are having Gradescope problems, please post a question on Piazza!

Go to 6.033 Home Page