M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering (Spring 2006)

April 7, 2006


Please find additional clarifications and details in the DP2 FAQ.

Design Project 2: Delay-Tolerant Wireless Networking with Cheap Laptops (v2)

(Note: this is version 2 of the design project. If you printed out a version that does not have this note, please print the current version.)

0. Assignment

You will do design project 2 in teams of three students from your recitation section.

There are four deliverables for Design Project 2:

  1. A list of team members emailed to your recitation instructor by April 11, 2006.

  2. One copy of a design proposal not exceeding 800 words, due on Thursday, April 27, 2006.
  3. One copy of a design report not exceeding 5,000 words, due on Thursday, May 11, 2006.
  4. An in-class presentation on May 16, 2006. Details of the presentations will be posted closer to the due date.

6.033 design reports are different from quizzes and problem sets. These projects, like those in real life, are under-specified, and it is your job to complete the specification sensibly, given the project requirements. As with real-world designs, those requirements often need some adjustment as you flesh out your design. We strongly recommend that you start early so that you can iterate your design. A good design will likely take more than a few days.

1. Introduction

Now that portable computers and wireless networking are commodity technologies, those who want a low-overhead way to deploy networks are building small, local wireless networks. Unlike traditional networks, these networks cannot depend on the Internet for connectivity, and their membership changes as users move around. For example, in the One Laptop Per Child Project (OLPC), researchers affiliated with MIT are planning to give low-cost laptops to millions of students in developing countries. These laptops have displays, wireless interfaces, and a moderate amount of storage.

The low-cost nature of these laptops makes them failure-prone. Specifically, their mean time to failure is several months. The type of failure is fail-stop failures: when the laptop fails, it stops working altogether. Recovery from these failures requires either replacing or repairing the laptop, and in either case, the contents of the laptop's disk will be lost.

In many deployment scenarios, these laptops, though not connected to the Internet, must still support cooperative applications that exchange data. Students in the OLPC project, in particular, may want:

  • File sharing. Students need to send each other lecture notes, music, photographs, and applications even when they are not within wireless communication range. Hence, laptops need to be able to send messages through one or more intermediate laptops and to relay messages for other laptops.

  • Backup. Because the laptops are failure prone, students will need to back up their important files on other laptops (and retrieve them after a failure).

In this project, you will design three components for the laptops:

  1. A delay-tolerant network-layer protocol that allows applications running on the laptop to exchange messages with other laptops without relying on the availability of the Internet or even continuous connectivity between pairs of laptops. By delay-tolerant we mean that that intermediate laptops sometimes store messages for long periods of time (e.g., minutes, hours, or days) before forwarding them. This network protocol must work even between two laptops that never come into communication range of each other and must not depend on there being a maximum number of intermediate laptops between any sender or receiver.
  2. The file sharing application described above. This application will make use of your delay tolerant network protocol.
  3. The backup application described above, which must also make use of your network protocol.

Interestingly, the delay-tolerance requirement is a real one faced in many parts of the developing world. There, disconnected networks may send their email to the Internet once per week. In those cases, the "first hop" is an intermittently connected computer. And sometimes this first hop is a "mobile server"; in other words, the email server travels by truck!

2. Requirements

Your system logically consists of two layers: (1) the lower-level messaging layer, and (2) the higher-level applications (i.e., backup and file sharing) that use this layer.

2.1 A Network Protocol that Tolerates Disconnection

Your protocol should provide the following API:

result send_msg(msg, len, dest, app_port);

result register_handler(app_port, handler_function);

Applications call send_msg to deliver a message, msg, of arbitrary length, len, to a remote laptop, dest. Note that the laptop cannot necessarily communicate directly with dest when the application calls send_msg. However, your network protocol should provide the following best-effort delivery guarantee: it will eventually deliver msg to the laptop named dest unless a failure occurs during that transmission. (Note that the two applications we have asked you to design have specific reliability requirements; you will need to decide to what extent those requirements are satisfied by your network protocol versus separately by each application.)

You should assume that every laptop has a unique name (e.g., a MAC address) and that applications have a way to discover the names of the laptops with which they wish to communicate. You do not need to design this naming service; assume, for example, that users in person tell each other their laptops' names before their applications communicate. dest is such a name.

app_port is the port on the destination laptop that will receive the message.

In response to send_msg, your network protocol may return a result that corresponds to an error code. This error code might mean that your network protocol was unable to accept the message for delivery (e.g., because its internal queues are full or because dest is unknown). However, in the absence of an error condition at the time that the application called send_msg, your network protocol must give the best-effort guarantee described above.

An application uses register_handler to register a procedure, handler_function, that your network protocol calls when a message arrives on a particular port, app_port. Specifically, when this event happens, your network protocol calls handler_function(recvd_msg, len), where recvd_msg is the message that just arrived and len is its length. register_handler returns an error if another application on the same computer has already registered for app_port.

In addition to providing this API, your implementation of this API should provide the following features:

  • Disconnected delivery: Some laptops rarely come into communication range of each other. These computers may still wish to exchange data; you will need to design a system that allows them to do so by relaying data through one or more intermediate laptops. Laptops also contain software that keeps track of other computers with which they have recently communicated (see Section 3 below). You may assume that recent communication history is a good predictor of future contacts.

  • Tolerance to disconnection: Your protocol must tolerate events like two laptops moving out of communication range of each other before one is finished transmitting a message to another. The numbers in Section 3 below suggest that for large messages, the likelihood of this event is high.

You may need to extend or modify the API given above to satisfy all of these requirements.

Your protocol will need to use the link-layer software that is installed on laptops and described below in Section 3.

The security and delivery acknowledgment are not required features of the network layer, but the backup and file transfer applications need both of these features (so you may wish to consider end-to-end arguments here).

2.2. File Transfer Application

The users of the file transfer application are either users or programs on the laptop that want to send a particular file from the local laptop to a particular remote laptop. The file transfer application provides the following API:
send(file, dest, callback);

In response to a call to send, the file transfer application should attempt to deliver the specified file to the specified dest. This application has several additional requirements:

  • Reliability: Your application should ensure that, with high probability, the file is delivered successfully. In designing this mechanism, you will want to use techniques like retransmissions, timers, and duplicate elimination that we have studied in the context of reliable network protocols. Your scheme may not be able to reliably deliver a message in the event of a network failure, or because it cannot confirm delivery after some long period of time.

  • Acknowledgments: Once the file has arrived (which might be hours or days after the user calls send), the application should invoke the user-supplied callback to inform the user that its request is complete and that the file has been delivered successfully. Alternatively, if your application cannot deliver the file, it is okay for your application to invoke callback with an error code indicating that the message could not be delivered.

  • Authenticity: The file transfer application on the receiving laptop needs a way to verify that the file it received is in fact identical to the file that the sender sent, and that the claimed sender is in fact the actual sender. In particular, it is possible that an intermediate laptop that forwarded the file may have tampered with the file contents or tried to forge the identity of the sender. Your design should describe how the receiver detects this situation.

  • Privacy: The files may contain sensitive information. Thus, the file transfer application must guarantee that intermediate laptops cannot read data that they relay.

2.3. Backup Application

The users of the backup application are either users or programs on the laptop that wish to protect a given file from loss. They invoke the backup application by giving it the name of a file. In response to this input, the backup application sends the file "into the network" by storing the file at other laptops. The backup application expects to be able to retrieve the file later, at the user's request. Users do not specify where (i.e., on which remote laptops) the file should be stored -- it is your application that makes this decision. Your backup application must expose the following interface:
  • backup(localname, netname, callback): Store the file called localname in the network and give it the name netname. When your backup application has stored the file successfully (we define success below), it invokes the user-supplied function callback. Many laptops may use the same netname and your design will need to be able to differentiate which laptop a given file belongs to.

  • restore(netname, callback): Retrieve the file with network name netname from the network. You can assume that users remember the names of the files (or the name of a directory containing the names of their files). When your application completes the restoration or determines that the file could not be found in the network, it invokes callback with an appropriate result code. Note that restore may take some time (hours or days) while it searches for the laptops that the files were stored on, though your application should be able to guarantee that it calls callback eventually. You may assume that remote laptops will not maliciously refuse to return a stored copy of a backed up file. You may likewise assume that remote laptops will not maliciously delete files backed up on them. However, remote laptops can certainly fail, just like your laptop can!

This application has several additional requirements:

  • Recoverability: After your application backs up a file, it should be able to restore the file unless all of the replicas it chose as backup sites simultaneously fail. In particular, the application needs to guarantee that the file is replicated at more than one location so that a single failed laptop does not cause the file to be lost. Your application also needs to ensure that after one replica fails, some replica eventually detects this failure and adds a new replica of the file (like all communication in your system, this failure detection process may take hours or days!). Finally, your application needs to ensure that it can actually find files it has previously stored (so that it can respond successfully to restore).

  • Acknowledgments: Users of your application need to be notified (via callback) when their file has been backed up successfully. Here backed up means successfully stored on more than one remote laptop. As with the file transfer application, a user may have to wait a long time (hours or days) for this acknowledgment.

  • Authenticity: The backup application needs a way to verify that a file it recovers via restore is in fact the same file that it previously stored in the network. In particular, it is possible that a remote laptop it chose to backup files on may have tampered with those files. Your design should describe how the recover function can detect this event.

  • Privacy: The backup application needs a way to guarantee that the remote laptops on which it performs backup are not able to read its files, since those files may contain sensitive information.

  • File lookup failure: The restore function must be able to indicate when it has determined that a file does not exist in the network.
Since both the file transfer and backup applications have similar security and message delivery guarantees, you will need to decide if these features should be implemented by the applications or by the network layer.

3. Existing Components and Parameters

As in real-world designs, you are neither starting from scratch nor working in an unconstrained environment.

Existing components

The laptop already has some useful software installed:
  • A link-layer protocol that allows a laptop to discover nearby laptops and send them messages. Specifically, the link-layer protocol provides the following functions:
    • get_neighbors(): returns a list of names of the laptops in communication range
    • result link_send(msg, len, dest): attempts to send the specified message to dest. If link_send() succeeds, it returns a result of 0. If it fails (e.g., because the two laptops were not in communication range of each other, or because they moved out of range during the transmission), result is 1. The maximum message size is 10 kilobytes. (Note: you can assume that the framing added by the link layer adds no bytes.)
    • link_register_handler(handler_function): Your network protocol will need to use link_register_handler to register a procedure, handler_function, that the link layer will call when messages intended for your network protocol arrive. handler_function must be a function that takes as input a message and a length. (Note: do not worry about concurrency issues. If a message arrives while handler_function is executing, the link layer will buffer the newly-arrived message and will call handler_function for the new message only after the currently-executing call to handler_function finishes.)
  • A service that keeps track of the last time a computer communicated with every other laptop. This service provides the following API:
      struct Neighbor {
        String name;
        Timestamp[100] last_heard;
      }
    
      Neighbor[] get_history();
    

    This function returns a list of all the neighbors a laptop has had contact with and the last 100 times when that contact occurred.
  • A cryptography library that exports functions for signing and sealing messages. This library is already installed: you should not need to specify how the encryption and hashing algorithms in this software work. You may assume that this library contains any of the security primitives described in Chapter 11 of the Course Notes.
In addition, you may, depending on your design, need to assume the existence of a Certificate Authority (CA) that is trusted by all entities (users, laptops, applications, etc.) If you make this assumption, then further assume (1) that all laptops already have the public key of the CA and (2) that the CA has given each laptop a certificate that attests the validity of the laptop's public key.

Quantities and Parameters

As with real-life designs, you should keep the particulars of the scenario in mind. We gave a qualitative description of the scenario above and now quantify some of that description. You may need to take these quantities into account when developing your design:

Parameter
Value
Number of laptops 1000
Average time elapsed between when a particular pair of senders come into communication range of each other, assuming those senders ever meet. (This means that any laptop will, on average, come into contact with every other laptop once every 48 hours.) Some pairs of laptops may never meet, and some laptops may meet with all other laptops with relatively high frequency. 48 hours
Average number of bytes that can be transmitted when two laptops come into contact 1 megabyte
Maximum size message that can be sent in link_send 10 kilobytes
Average file size 100 kilobytes
Maximum file size 10 megabytes
Average number of files per laptop 10,000
Average number of files that any given laptop wishes to backup elsewhere 1000
Storage capacity of each computer 10 gigabytes

4. The Design Report

Your report should demonstrate how your system meets the requirements described in Section 2. We encourage you to make judicious use of figures, pseudo-code, and protocol diagrams to explain your design. Make sure that, in the process of explaining your design, you include the following elements:
  • A protocol for routing and forwarding. Forwarding is how your network layer encapsulates messages and sends them over the link layer from one laptop to the next. Routing is the process by which your laptop discovers a path (i.e., a sequence of laptops) through the network to a destination laptop. Keep in mind that the intermediaries in this path may need to hold the message for a long time (as mentioned in Sections 1 and 2). Be sure to include pseudocode, a flowchart, or other procedural description of how send_msg() is implemented. Provide a qualitative argument that your implementation will usually result in successful message delivery. Be sure you show how laptops discover each other and begin exchanging data via the link_send function.
  • A design for how your system provides reliability and acknowledgments of delivery to the file sharing and backup applications. You may need to think about retry timers, duplicate elimination, and queuing.
  • A design for the implementation of send(), from the file sharing application. Include pseudocode, a flow chart, or other procedural description. Provide a qualitative argument that your approach results in the delivery of files and acknowledgments in most cases.
  • A design for the implementation of backup() and restore(), including where files are placed in the network, how files are replicated and named, and how restore() finds one of those replicas. Include pseudocode, a flow chart, or other procedural description. Provide a qualitative argument that your approach leads to correct backup and recovery in the absence of multiple simultaneous failures.
  • A description of how the backup and file sharing applications determine the authenticity of their messages.
  • A description of how the backup and file sharing applications protect their data so that unauthorized users cannot read it.
  • A brief rationale for how you decided which functions (reliability, privacy, etc.) to place in which layers. As mentioned several times in Section 2, there are different possibilities for these decisions. Indeed, there is no single correct answer, and the important thing is that you justify your approach with logical arguments.
  • A discussion of the limitations of your design, and of situations in which your solution fails to meet the best effort backup and delivery guarantees of the backup and file sharing applications.
It's okay, and sometimes desirable, to say, "no, my design doesn't do that" as long as you can identify what your design can and can't do, and explain why you made the trade-offs you did. Remember: a simple, correct design that meets the requirements laid out above is more important than a complex one that is hard to understand and does a flaky job. When in doubt, leave it out!

5. Your Written Work

The following are some tips on writing your design proposal and report.

Audience: You may assume that the reader has read the DP2 assignment but has not thought carefully about a solution. Give enough detail that your project can be turned over successfully to an implementation team.

5.1 Design Proposal

The design proposal should be a concise summary of your design choices and of the overall system design. It should include a description of your basic network protocol, including how it routes and forwards messages, Include a diagram or pseudocode illustrating the basic operation of these two processes. Describe how the file sharing and backup applications are implemented on top of your networking protocol. Be sure to describe how the applications meet their reliability, recoverability, and security goals. Provide rationale for your major design decisions, and specify any limitations (such as cases where your system fails to deliver messages) your design imposes. You should have a complete design but do not have to present detailed protocols or pseudocode for all of the APIs. Also, if any of your design decisions is "unusual" (particularly creative, experimental, risky, or specifically warned against in the assignment), it would be wise to describe them here.

5.2 Design Report

Report Outline (use this organization for your report):

  • Title Page: Give your design report a title that reflects the subject and scope of your project. Include your name, recitation time and section, and the date on the title page.
  • No Table of Contents.
  • Introduction: Summarize the salient aspects of your design, the trade-offs you made and give a brief rationale for the design you have chosen.
  • Design Description: Explain and elaborate your solution. Show how your solution satisfies the design constraints and solves the design problem (or how it fails to do so). Be sure to identify trade-offs you made, and justify them. You will want to sub-divide this section into subsections. Use figures and pseudocode or other procedural descriptions of the network and application APIs where needed, to support your design choices. Be sure to describe any limitations of your design.
  • Conclusion: Provide a short conclusion that provides recommendations for further actions and a list of issues that must be resolved before the design can be implemented.
  • Acknowledgments and References: Give credit to individuals whom you consulted in developing your design. Reference any sources cited in your report. The style of your references should be in the format described in the Mayfield Handbook's explanation of the IEEE style.

6. How do we evaluate your report?

When evaluating your report, your recitation instructor will assign a letter grade based both on content and writing. (Note that for DP2, the writing center will not grade the proposal or report, but they are happy to help you with your writing. You should visit them.)

Some content considerations:

  • Does your solution actually address the stated problem?
  • Do you explain your decisions and the trade-offs?
  • How complex is your solution? Simple is better, yet sometimes simple will not do the job. On the other hand, unnecessary complexity is bad.
  • Is your design correct and clear?
  • Do you analyze and explain the performance of your protocols?
  • Are your assumptions and decisions reasonable?

Some writing considerations:

  • Is the report clear?
  • Is the report well-organized? Does it follow standard organizational conventions for technical reports? Are the grammar and language sound?
  • Do you use diagrams and/or figures appropriately? Are diagrams or figures appropriately labeled, referenced, and discussed in the text?
  • Does the report use the concepts, models, and terminology introduced in 6.033? If not, does it have a good reason for using different vocabulary?
  • Does the report address the intended audience?
  • Are references cited and used correctly?

7. Collaboration

This project is a team effort. You will work in teams of 3 students from your recitation. You may not work with students in other recitations.

8. Tasks and Due Dates

  • List of team members Send your list of team members to your recitation instructor and TA by April 11, 2006.
  • One copy of design proposal (800 words or fewer) Due: April 27, 2006.
  • One copy of detailed design report (5,000 words or fewer) Due: May 11, 2006.

  • In class presentation on May 16, 2006.
Please state the length (in words) of your writeup on the last page of your design report.

Please choose a font size and line spacing that will make it easy for us to write comments on your report (e.g., 11 pt font or larger; 1.5-spaced or greater). Please use 1-inch margins.