Hands-on 3: MapReduceWarmupThis assignment asks you to write a simple parallel program with the MapReduce library using a single-machine python implementation. Download the following files: mapreduce.py and kjv12.txt. If you'd prefer to use Python 3, not Python 2,
download this version of mapreduce.py
instead (and rename the file to mapreduce.py).
Then, run mapreduce:
python mapreduce.py kjv12.txtAfter running for a little while, the output should be as follows: and 12846 i 8854 god 4114 israel 2574 the 1842 for 1743 but 1558 then 1374 lord 1070 o 1065 david 1064 jesus 977 moses 847 judah 816 jerusalem 814 he 754 now 643 so 622 egypt 611 behold 596 The output has two columns: the first column has a lower-case version of a title-cased word that appears in the ASCII bible and the second column has a count of the number of times that word appears in the bible. The output is trimmed to only display the top 20 results sorted by descending word count. Studying mapreduce.py
The program in if __name__ == '__main__':
It creates an instance of the You may find the Python Reference useful in answering the following questions. In particular, the sections on Multiprocessing and Process Pools may be useful. QuestionsNow you're ready for this week's questions. Like before, the questions are in a read-only google doc. Make sure to enter quesitons in the page indicated (please do not erase the question text) and upload them as a PDF to Gradescope. See more detailed instructions at the end of the first week's hands-on. If you are having Gradescope problems, please post a question on Piazza! |