6.00: Introduction to Computer Science and Programming

Problem Set 2

Handed out: Tuesday, September 22, 2009.
Due:
11:59pm, Tuesday, September 29, 2009.

Introduction

This problem set will introduce you to using strings, dictionaries, and functions. You will design and write several functions, test them, and hand them in. Be sure to read this problem set thoroughly, especially the sections of Collaboration and the Handin Procedure.

Encryption is the process of obscuring information to make it unreadable without special knowledge (kind of like Professor Guttag's handwriting). For centuries, people have devised schemes to encrypt messages -- some better than others -- but the advent of the computer and the Internet revolutionized the field. These days, it's hard not to encounter some sort of encryption, whether you are buying something online or logging into Athena.

A cipher is an algorithm for performing encryption (and the reverse, decryption). The original information is called plaintext. After it is encrypted, it is called ciphertext. The ciphertext message contains all the information of the plaintext message, but it's not in a format readable by a human or computer without the proper mechanism to decrypt it; it should resemble random gibberish to those not intended to read it.

A cipher usually depends on a piece of auxiliary information, called a key. The key is incorporated into the encryption process; the same plaintext encrypted with two different keys should have two different ciphertexts. Without the key, it should be difficult to decrypt the resulting ciphertext into readable plaintext.

This assignment will deal with a well-known (though not very secure) encryption method called the Caesar cipher.

Collaboration

Please take note of the policy in the course information handout.

Caesar Cipher

In this problem set, we will examine the Caesar cipher. The basic idea in this cipher is that you pick an integer for a key, and shift every letter of your message by the key. For example, if your message was "hello" and your key was 2, "h" becomes "j", "e" becomes "g", and so on. If you overshoot the end of your alphabet, you just wrap back around, e.g. "z" would become "b". If you're interested in learning more about the Caesar cipher, check out the Wikipedia article.

Getting Started

Download and save

·        ps2.py: the skeleton you'll fill in

·        words.txt: a list of English words

·        ps2test.py: some test cases

Run the code without making any modifications to it, in order to ensure that everything is set up correctly. The code that we have given you loads a list of words from a file. If everything is okay, after a small delay, you should see the following printed out:

Loading word list from file...

   55902 words loaded.

If you see an IOError instead (e.g., No such file or directory), you should change the value of the WORDLIST_FILENAME constant (defined near the top of the file) to the complete pathname for the file words.txt (this will vary based on where you saved the file).

The file ps2.py has a few already implemented functions you can use while writing up your solution. You can ignore the code between the following comments, though you should read and understand everything else:

# -----------------------------------

# Helper code

# (you don't need to understand this helper code)

. . .

# (end of helper code)

# -----------------------------------

Problem 1. Encryption

Write a program to encrypt plaintext into ciphertext using the Caesar cipher. It should have the following two functions:

build_coder(shift)

        This function takes an int shift and returns a dict. shift acts as the cipher's key. The dictionary acts as the coder; if you give the dict a letter as the key, it will return the encoded letter as the value. So for a Caesar cipher with shift key 3, the dictionary looks like {'a': 'd', 'b': 'e', 'c': 'f' ... 'A': 'D', 'B': 'E' ...}. Note that the cipher preserves case: 'a' becomes 'd' and 'A' becomes 'D'.

apply_coder(text, coder)

        This function takes a string text and a dict coder. It uses coder to encode the plaintext text and returns the ciphertext as a string. If there are characters in text that are not in the alphabet (such as punctuation or numbers), those characters should not be encoded, but they should still be in the output.

Once you've written those two functions, you should be able to use them to encode strings.

Problem 2. Decryption

Your friend, who is also taking 6.00, is really excited about the program she wrote for Problem 1 of this problem set. She sends you emails, but they're all encrypted with the Caesar cipher!

The problem is, you don't know which shift key she is using. The good news is, you know your friend only speaks and writes English words. So if you can write a program to find the decoding that produces the maximum number of words, you can find the right decoding.

We have provided you with two functions that you may find helpful:

get_words(text)

        This function takes in the string text and returns a tuple of the words in it. It does so by assuming that the words in text are separated by whitespace. This function does not check if the words are valid English words. For example, get_words("abc def ghi dog cat fox bunny") returns ('abc', 'def', 'ghi', 'dog', 'cat', 'fox', 'bunny').

is_a_word(word)

        This function takes in the string word and returns True if the word is a valid English word, False otherwise. It can handle adjacent punctuation (e.g. "Hello!" is recognized as a word), but it does not recognize contractions such as "don't."

Write the following functions:

build_decoder(shift)

        This function takes in an int shift and returns a dict. This dictionary acts as the decoder; if you give it a ciphertext letter as the key, it will return the decoded plaintext letter as the value. Hint: This function is very easy to write if you use build_coder().

decode(cipherText)

        This function takes in the ciphertext string cipherText and returns the decoded plaintext as a string. Hint: use build_decoder(), apply_coder(), and exhaustive enumeration. If your apply_coder() was written properly, you should be able to use it to decode ciphertext using a decoder dict generated by build_decoder().

Once you've written decode(), you can decode your friend's emails!

Test Cases:

We have given you some test cases in the file ps2test.py. Save this file to the same directory as your ps2.py file. Then, you can run ps2test.py. It will run a series of tests and print the number of passed tests and failed tests. Using our test cases is entirely optional, but it is highly recommended that you use these tests to help you determine if your program is running correctly.

Please note that these tests are not comprehensive. We will be running other tests when grading your problem set. Do not assume that your code is correct just because your code passes these test cases.

Hand-In Procedure

1. Save

You should be using the ps2.py skeleton given to you in this problem set. Fill in the code for the four functions build_coder(), apply_coder(), build_decoder(), and decode(). Any other code is not necessary. Save your solution as ps2.py. Do not ignore this step or save your file with a different name. You do not have to turn in any other files, such as ps2test.py.

2. Time and Collaboration Info

At the start of each file, in a comment, write down the number of hours (roughly) you spent on the problems in that part, and the names of the people you collaborated with. For example:

 # Problem Set 2

 # Name: Jane Lee

 # Collaborators (Discussion): John Doe

 # Collaborators (Identical Solution): Jane Smith

 # Time: 1:30

 # 

... your code goes here ... 

3. Submit

To submit a file, upload it to your workspace. If there is some error uploading to your workspace, email the file to 6.00-staff [at] mit.edu.

You may upload (or email) new versions of each file until the 11:59pm deadline.