6.00: Introduction to Computer Science
and Programming
Problem Set 2
Handed out:
Due:
Introduction
This problem set
will introduce you to using strings, dictionaries, and functions. You will
design and write several functions, test them, and hand them in. Be
sure to read this problem set thoroughly, especially the sections of
Collaboration and the Handin Procedure.
Encryption is the process of obscuring information to make it
unreadable without special knowledge (kind of like Professor Guttag's
handwriting). For centuries, people have devised schemes to encrypt messages --
some better than others -- but the advent of the computer and the Internet
revolutionized the field. These days, it's hard not to encounter some sort of
encryption, whether you are buying something online or logging into Athena.
A cipher is an algorithm for performing encryption (and the reverse,
decryption). The original information is called plaintext. After it is
encrypted, it is called ciphertext. The ciphertext message contains all the
information of the plaintext message, but it's
not in a format readable by a human or computer without the proper mechanism to
decrypt it; it should resemble random gibberish to those not intended to read
it.
A cipher usually depends on a piece of auxiliary information, called a
key. The key is incorporated into the encryption process; the same plaintext
encrypted with two different keys should have two different ciphertexts.
Without the key, it should be difficult to decrypt the resulting ciphertext
into readable plaintext.
This assignment will deal with a well-known (though not very secure)
encryption method called the Caesar cipher.
Collaboration
Please take note of the policy in the course information handout.
Caesar Cipher
In this problem set, we will examine the
Caesar cipher. The basic idea in this cipher is that you pick an integer for a
key, and shift every letter of your message by the key. For example, if your
message was "hello" and your key was 2, "h" becomes
"j", "e" becomes "g", and so on. If you overshoot
the end of your alphabet, you just wrap back around, e.g. "z" would
become "b". If you're interested in learning more about the Caesar
cipher, check out the Wikipedia article.
Getting
Started
Download and save
·
ps2.py: the skeleton
you'll fill in
·
words.txt: a list of
English words
·
ps2test.py: some test
cases
Run the code without making any modifications
to it, in order to ensure that everything is set up correctly. The code that we
have given you loads a list of words from a file. If everything is okay, after
a small delay, you should see the following printed out:
Loading word list from file...
55902 words
loaded.
If you see an IOError instead (e.g., No
such file or directory), you should change the value of the
WORDLIST_FILENAME constant (defined near the top of the file) to the complete
pathname for the file words.txt (this will vary based on where you saved the
file).
The file ps2.py has a few already implemented
functions you can use while writing up your solution. You can ignore the code
between the following comments, though you should read and understand
everything else:
#
-----------------------------------
# Helper code
# (you don't need to
understand this helper code)
. . .
# (end of helper code)
#
-----------------------------------
Problem 1. Encryption
Write a program
to encrypt plaintext into ciphertext using the Caesar cipher. It should have
the following two functions:
build_coder(shift)
This function takes an int shift and returns a dict. shift acts as the cipher's key. The
dictionary acts as the coder; if you give the dict a letter as the key, it will
return the encoded letter as the value. So for a Caesar cipher with shift key
3, the dictionary looks like {'a': 'd', 'b': 'e', 'c':
'f' ... 'A': 'D', 'B': 'E' ...}. Note that the cipher preserves case: 'a' becomes 'd' and 'A' becomes 'D'.
apply_coder(text, coder)
This function takes a string text and a dict coder. It uses coder to
encode the plaintext text and returns
the ciphertext as a string. If there are characters in text that are not in the alphabet (such as punctuation or numbers),
those characters should not be encoded, but they should still be in the output.
Once you've
written those two functions, you should be able to use them to encode strings.
Problem 2. Decryption
Your friend, who
is also taking 6.00, is really excited about the program she wrote for Problem
1 of this problem set. She sends you emails, but they're all encrypted with the
Caesar cipher!
The problem is,
you don't know which shift key she is using. The good news is, you know your
friend only speaks and writes English words. So if you can write a program to
find the decoding that produces the maximum number of words, you can find the
right decoding.
We have provided
you with two functions that you may find helpful:
get_words(text)
This function takes in the string text and returns a tuple of the words in
it. It does so by assuming that the words in text are separated by whitespace. This function does not check if
the words are valid English words. For example, get_words("abc def ghi dog
cat fox bunny") returns ('abc', 'def', 'ghi', 'dog', 'cat', 'fox',
'bunny').
is_a_word(word)
This function takes in the string word and returns True
if the word is a valid English word, False otherwise. It can handle adjacent
punctuation (e.g. "Hello!" is recognized as a word), but it does not
recognize contractions such as "don't."
Write the
following functions:
build_decoder(shift)
This function takes in an int shift
and returns a dict. This dictionary acts as the decoder; if you give it a
ciphertext letter as the key, it will return the decoded plaintext letter as
the value. Hint: This function is very easy to write if you use build_coder().
decode(cipherText)
This function takes in the ciphertext
string cipherText and returns the
decoded plaintext as a string. Hint: use build_decoder(),
apply_coder(), and exhaustive enumeration. If your apply_coder()
was written properly, you should be able to use it to decode ciphertext using a
decoder dict generated by build_decoder().
Once you've
written decode(), you can decode your friend's emails!
Test Cases:
We have given you some
test cases in the file ps2test.py. Save this file to
the same directory as your ps2.py file. Then, you can run
ps2test.py. It will run a series of tests and print the number of passed tests
and failed tests. Using our test cases is entirely optional, but it is highly
recommended that you use these tests to help you determine if your program is
running correctly.
Please note that these
tests are not comprehensive. We will
be running other tests when grading your problem set. Do not assume that your
code is correct just because your code passes these test cases.
Hand-In Procedure
1. Save
You should be
using the ps2.py skeleton given to you in this problem set. Fill in the code
for the four functions build_coder(), apply_coder(),
build_decoder(), and decode(). Any other code is not necessary. Save your
solution as ps2.py. Do not ignore this step or save your file with a
different name. You do not have to
turn in any other files, such as ps2test.py.
2. Time and Collaboration Info
At the start of each file, in a comment, write down the number of hours (roughly) you spent on the problems in that part, and the names of the people you collaborated with. For example:
# Problem Set 2
# Name: Jane Lee
# Collaborators (Discussion): John Doe
# Collaborators (Identical Solution): Jane Smith
# Time:
#
... your code goes here ...
3. Submit
To submit a file, upload it to your workspace. If there is some error uploading to your workspace, email the file to 6.00-staff [at] mit.edu.
You may upload (or email) new versions of each file until the 11:59pm deadline.