M.I.T. DEPARTMENT OF EECS
6.033 - Computer System Engineering | MapReduce Assignment |
Complete the following hands-on assignment. Do the activities described, and submit your solutions via Gradescope by 11:59p.
This assignment asks you to write a simple parallel program with the MapReduce library using a single-machine python implementation.
Download the following files: mapreduce.py, reverseindex.py, and kjv12.txt. Then, run mapreduce:
> python mapreduce.py kjv12.txtAfter running for a little while, the output should be as follows:
and 12846 i 8854 god 4114 israel 2574 the 1843 for 1743 but 1558 then 1374 lord 1071 o 1066 david 1064 jesus 978 moses 847 judah 816 jerusalem 814 he 754 now 643 so 622 egypt 611 behold 596
The output has two columns: the first column has a lower-case version of a title-cased word that appears in the ASCII bible and the second column has a count of the number of times that word appears in the bible. The output is trimmed to only display the top 20 results sorted by descending word count.
We will now study mapreduce.py. The program begins execution after the following statement:
if __name__ == '__main__':
We then create an instance of the WordCount
class using a few
parameters. The last parameter comes from the command line. In our
example, it is kjv12.txt
. This parameter controls which file we
will be executing MapReduce on. Immediately after initialization, we
call run
on the WordCount
instance. When we call run on
our WordCount
instance, the Python MapReduce library runs the
MapReduce algorithm using the map
and
reduce
methods defined in the WordCount
class.
You may find the Python Reference useful in answering the following questions. In particular, the sections on Multiprocessing and Process Pools may be useful.
Now you're ready for this week's questions.
Like before, the questions are in a read-only google doc. Make sure to enter quesitons in the page indicated (please do not erase the question text) and upload them as a PDF to Gradescope. See more detailed instructions at the end of the first week's hands-on. If you are having Gradescope problems, please post a question on Piazza!
Go to 6.033 Home Page |