6.033 2011 Design Project 2

I. Due Dates and Deliverables

You will be working on the second design project in teams of three students who share the same recitation instructor. There are four deliverables for this design project:

A list of team members emailed to your TA by April 7, 2011.
One copy of a design proposal (not exceeding 1,200 words), due at 5pm on April 21, 2011.
One copy of a design report (not exceeding 5,000 words), due at 5pm on May 5, 2011.
A five-minute in-recitation presentation, on May 10, 2011. In consultation with the chair of the faculty we have determined that the assignment follows the spirit of the end-of-term rules.

All deliverables should be submitted via the online submission site.

As with real life system designs, 6.033 design projects are under-specified, and it is your job to complete the specification in a sensible way given the overall requirements of the project. As with designs in practice, the specifications often need some adjustment as the design is fleshed out. We recommend that you start early so that you can evolve your design over time. A good design is likely to take more than just a few days to put together.

II. The Problem

Ben Bitdiddle watched the ``Twitter revolutions'' in several countries over the past few months, and was disappointed to see how government was able to prevent communications by shutting down ISPs in these countries. Even though many people in these countries had computers, laptops, and cell phones with wireless communication capability, they weren't able to exploit it to organize themselves. Ben wants to come up with a distributed messaging system that does not require any central server or network infrastructure, and allows users to send messages and subscribe to other users' messages, similar to a publish-subscribe system.

Consider the scenario of a densely populated urban area such as Cairo, which might have about 10 million people with WiFi-enabled devices (Metropolitan Cairo has about 20 million people), where each device is often (but not always) within radio range of other devices. Many users will want to follow (subscribe to) messages from a small number of popular topics; for instance, you can think of one topic having upwards of 1M subscribers. On the other hand, less popular topics may only have a handful of subscribers. Publishers send messages to a topic (identified by a string), and users subscribe only to topics, and not to individual publishers. There's no access control: any publisher can send a message to a topic, and any user can subscribe to a topic. (Topics are kind-of like ``hash-tags'' in Twitter, except that users subscribe to a topic instead of manually searching for a hash-tag on Twitter. The reason for the difference is that there is no single server that's always online and can always perform a hash-tag search in this design.) Our system should allow all publishers to post messages to topics, and have the messages delivered to subscribers of those topics in a timely manner (e.g., in a few hours at most). The goal is for a publisher to be able to announce a gathering at Tahrir square the next morning, and for most subscribers to receive this message in time.

A wireless radio works by allowing one node to broadcast a message to any other node within radio range; those nodes will in turn receive this message, and process it accordingly. You can assume that the wireless layer provides you with four functions:

scan() returns the list of nodes.
broadcast(msg) broadcasts msg to all nodes within radio range, i.e., likely every node returned by scan().
send(node, msg) broadcasts msg to all nodes within radio range, with a header indicating that only the specified node should look at it.
receive() returns any messages received from nearby nodes.

If several nodes are within radio range of each other, only one of them can transmit at a time -- multiple transmissions would collide with each other, making it impossible for receivers to receive any of the messages. You can assume that the send function takes care of ensuring this, similar to how Ethernet's collision detection works. You can think of this as a fixed amount of bandwidth available for communication between nodes in the same physical area.

You can assume a densely populated area of 10M users, where each node is often within radio range of 5 other nodes. Assume that 10% of the nodes are offline at any given time, although some nodes (e.g., an office or a home desktop system) are much more reliable than others (e.g., cell phones or laptops). You can assume that there are at least 1% highly reliable nodes, and that they are evenly distributed over the area in question. The average wireless rate is 1 Mbit/sec.

III. Requirements

The challenges you should address in your design project are as follows:

How does a user subscribe to a topic? How does a message from some publisher sent to this topic find out about all of the subscribers? What if the user is offline when the message is sent, or if the publisher is offline when the user subscribes?
How does your system support popular topics that may have 10K-100K subscribers without requiring any single computer with sufficient bandwidth to handle this traffic, and without requiring any single computer to be always online? How do publishers send messages to such popular topics?
How does your system deal with individual users or groups of users that turn off their computers when a message was published, or whose wireless radios were out of range of the publisher at the time the message was sent?
What happens if some user's computer breaks permanently, or moves to a different location in the network?
How does a user move to a different computer, if their original computer broke? Can they preserve their subscriptions?

You do not need to address security, anonymity, or authentication in your design project (although it is an optional challenge problem).

After you have designed your system, you should evaluate it under the following scenario:

Ten popular topics have 1M subscribers each, and get a message every hour. These topics each get a new subscriber every minute.
There are also 1M other topics, each with 5 subscribers, and each topic gets a new message every 5 minutes. However, the 5 subscribers are often physically near the publisher, meaning that these messages are of physically local interest only.

You should specify how much traffic will be caused in this scenario, how much storage will be required, and when the subscribers will receive the messages.

Optional challenge problems:

Provide message authentication, meaning that a subscriber can be reasonably confident that any message he or she sees was written by the author the message claims to be from. Signing messages is a good first step towards solving this problem, but how does the subscriber know the publisher's key?
How can your system deal with malicious nodes that may be run by the government or some other adversary? There are many threats you might consider from a malicious node, such as a node that simply drops subscription requests from users, or a node that actively sends messages to other nodes in an attempt to prevent message delivery.
Can you provide some form of anonymity or plausible deniability, so that a government cannot tell what messages a publisher might be writing, or what messages a user might be subscribing to?

IV. Design proposal

The design proposal should summarize your design in 1,200 words or fewer. It should outline the protocol and algorithms the nodes use to publish messages, to subscribe to topics, and to route messages from publishers to subscribers.

You do not have to present a detailed rationale or analysis in your proposal. However, if any of your design decisions are unusual (particularly creative, experimental, or risky) or if you deviate from the requirements, you should explain and justify those decisions in your proposal.

You will receive feedback from your TA in time to adjust your final report.

V. Design report

Your report should explain your design. It should discuss the major design decisions and tradeoffs you made, and justify your choices. It should discuss any limitations of which you are aware. You should assume that your report is being read by someone who has read this assignment, but has not thought carefully about this particular design problem. Give enough detail that your project can be turned over successfully to an implementation team. Your report should convince the reader that your design satisfies the requirements in Section III.

Use this organization for your report:

Title page: Give your report a title that reflects the subject and scope of your project. Include your names, email address, recitation instructor, section time(s), and the date on the title page.
No table of contents.
Introduction: Summarize what your design is intended to achieve, outline the design, explain the major trade-offs and design decisions you have made, and justify those trade-offs and decisions.
Design: Explain your design. Identify your design's main components, state, algorithms, and protocols. You should sub-divide the design, with corresponding subsections in the text, so that the reader can focus on and understand one piece at a time. Explain why your design makes sense as well as explaining how it works. Use diagrams, pseudo-code, and worked examples as appropriate.
Analysis: Explain how you expect your design to behave in different scenarios. What scenarios might pose problems for performance or correctness? What do you expect to be the scalability limits of your design?
Conclusion: Briefly summarize your design and provide recommendations for further actions and a list of any problems that must be resolved before the design can be implemented.
Acknowledgments and references: Give credit to individuals whom you consulted in developing your design. Provide a list of references at the end using the IEEE citation-sequence system ("IEEE style") described in the Mayfield Handbook.
Word count. Please indicate the word count of your report at the end of the document.

Here are a few tips:

Use ideas and terms from the course notes when appropriate; this will save you space (you can refer the reader to the relevant section of the notes) and will save the reader some effort.
Before you explain the solution to any given problem, say what the problem is.
Before presenting the details of any given design component, ensure that the purpose and requirements of that component are well described.
It's often valuable to illustrate an idea using an example, but an example is no substitute for a full explanation of the idea.
You may want to separate the explanation of a component's data structures (or packet formats) from its algorithms.
Explain all figures, tables, and pseudo-code; explain what is being presented, and what conclusions the reader should draw.

While the Writing Program will not be grading DP2, you should feel free to ask them for help.

VI. Presentation

You will have only about five minutes for your presentation. The audience will be very familiar with the problem, so you can get right to the guts of your solution.

VII. How we evaluate your work

Your recitation instructor will assign your report a grade that reflects both the design itself and how well your report presents the design. These are the main high-level grading criteria:

Clarity. Is the design described well enough to be understood, evaluated, and implemented?
Correctness. Does the design achieve the requirements laid out in Section III? Does your report give a convincing analysis that your design works under the described scenario? Will the design be stable if the workload is slightly different from the described scenario?
Simplicity. Is the level of complexity in the design justified?

VII. Clarifications

``Physically close'' does not necessarily mean within wireless radio broadcast range.
Even for unpopular topics, your design must support an occasional user from far away deciding to subscribe to this topic.
If a node is powered off, or is out of radio range, it must reasonably catch up on messages to topics it subscribed to, which it may have missed in the meantime, once it powers up and comes within radio range of other nodes. You can define what ``reasonable'' means in your system---for example, it seems OK for a laptop to miss messages if it was offline for a month---but make sure your design works for the application mentioned above (announcing a meeting in Tahrir square).
You cannot assume that devices have GPS units. As an optional challenge problem, you can explain how your design would change if some devices had access to GPS, but your base design has to work without GPS data.
You can assume that the send() function is reliable: if it indicates that transmission was successful, then the other node received the message. Keep in mind that the other node can crash shortly after receiving your message, so this assumption may not be that important or useful. The broadcast() function is not reliable: it does not request (or wait for) acknowledgments.