Spring 2002 6.826 Project Suggestions Ad-Hoc Mobile Networking Creating a wireless network infrastructure is costly and requires the involvement of large organizations to look after installed base. And all of this seems so unnecessary, since there are always lots of people wandering around with wireless devices that are perfectly capable of routing messages from a sender to a desired recipient. Why not just build a network out of whatever devices seem to be available at the time? Potential issues you might want to consider: a) How do you deal with the fact that devices are always moving around? b) How does the system route a message to a device if the sender has no idea where the receiver is located or what route it might be able to use? c) How much power are you going to consume from the devices of unsuspecting passers-by? Relaxed Memory Consistency It turns out that in a multiprocessor with caching, computer architects believe that they can get better performance by building machines that are not guaranteed to preserve the observed order of writes. More specifically, if a one thread performs a sequence of writes such as x=1; y=2; z=3; starting from a state in which x,y, and z =0, it may be possible for other threads to observe states in which x=0, y=0, and z=3, and so on. To enable programmers to develop programs, these machines typically guarantee that if a program uses synchronization correctly, it will not observe any of these reorderings. There have been many papers published on this topic, but there is still a feeling in the community that we don't have a complete handle on this phenomenon. It would be nice to develop enough formal specification of what is going on to better understand the problem. Striped File Systems Traditional file systems have been implemented on top of a single disk. But this has drawbacks: large files must be read sequentially, and if the disk fails, you lose everything. Another way to implement a file system is to somehow distribute the data across many disks. This way a system can read the data in parallel. And if you put some redundancy in the writes to disk, your system can even tolerate the failure of one or more disks before it loses data. Some questions to think about: a) How would you build such a system using a collection of machines, each with a single disk? b) How would you make the data secure against eavesdroppers attempting to determine the contents of the file? c) What kind of system issues would affect whether a machine would observe any performance increase from reading or writing data in parallel? d) What interface to the file system would you provide? Package tracking system Write a spec and a high-level implementation for the Fedex package tracking system. You might starting by going to the Fedex web site and examining how the system works. The tricky part of this project is separating the tracking from other functions, notable package routing. You should treat these independently: tracking has as input the information that the bar code scanners pick up about the package identity at various points in the Fedex system, as well as the package source and intended destination. Routing is an entirely separate function driven by information about current locations and destinations from the tracking system, as well as knowledge of available transport. Work on the tracking system first, and then go on to routing if you have time. Garbage collection Write one or more specs for garbage collectors, and one or more high-level implementations. The specs should first deal with reachability, assuming an infinite pool of memory to draw from. Then it should take a finite pool into account, and say something about when a request won't be satisfied. You might look at mark-and-sweep, copying, and generational collectors. Note that collectors are used in log-structured file systems and object-oriented data bases as well as in main memory allocators. Server farms Write specs and high-level implementations for a collection of machines used to implement a web server farm. Such farms are usually organized into=20 * Stateful machines that store persistent data * Stateless machines that do front-end processing * Switches that send requests to the right places Things to think about are scaling, initialization, partitioning of data, replication for availability and for load balancing.