6.033 Design Project 1 Comments

M.I.T. DEPARTMENT OF EECS


6.033 - Computer System Engineering (Spring 2001)	Handout 18 - April 12, 2001

Design Project 1 Comments

Writing Style & Structure

The quality of the writing was generally high, and most of you seem to have a good sense of how to organize a report. To help you avoid some common mistakes in your next report, here are some suggestions:

Write a helpful abstract. The abstract must contain information about your solution, not just a description of the problem and an assertion that it has been solved with perfect performance, scalability, etc. A few sentences explaining the solution should be enough. Avoid using technical jargon, but do say "using a variant of scheme X..." if scheme X is well known and applies. Remember that the purpose of an abstract is not only to entice the reader to read the whole report, but also to give the reader a chance to guess how the solution relates to others proposed. Specifically: The abstract should not be either a prose form of your table of contents, the first paragraph of your intro, the last paragraph of your conclusion, or a combination thereof.
Stage the description. Start out with a general introduction. Remember your intended audience; don't expect they know all the details of a Cyberbeanie. You should explain your design at several levels of abstraction. First explain the the key intuition about how it works and what properties it relies on. Then explain the protocol in an abstract way, assuming the link layer as a service. Finally, you can explain details of procedure calls if you think it's necessary. If you miss the high level explanations, it's often impossible to infer them from the low-level ones; bear in mind that it's easier for an educated reader to infer the other way. Specifically: Do not start out by describing packet and memory formats. First motivate what you need to send/store, then show how you did it.
Separate abstraction levels. Don't mix levels of abstraction into a single, woven narrative. Don't discuss several layers at once. You should not discuss code details while you're explaining the scheme in outline. While small snippets of psuedocode may occasionally be useful (assuming the code itself is clear), code is generally best relegated to an appendix. It is often convenient to separate the abstract contents of the packets and data structures from their concrete representations. Certainly, you should avoid using representation-specific details in your explanation (e.g., talking about packet types in terms of their binary encodings, or talking about lookups in a table in terms of offsets).
Focus on explanation, not code. The code should be viewed as something to look at to find low-level details omitted from the text. In many cases, the text was good enough that we didn't read the code at all. If the text was incomprehensible, we may not have even tried to reverse engineer the ideas from the code.
Separate rationale, design and analysis. Several of you included comments about the rationale for your design, and analysis of its performance, within the narrative explaining the design itself. This makes the design much harder to understand, especially since it's sometimes hard to tell which features are part of the design and which are part of a rejected alternative. Don't do it. Separate these three things into clearly delineated sections.
Use symbolic names. A report is like a program. It's easier to read, and easier to modify, if you use symbolic names (for packet types, network dimensions, radio transmission rate, etc). Even if the value is fixed, it still makes sense to give it a name. An argument of the form "the time between packets is obviously bounded by 49 * 132 / 12" is not easy to follow. Even with symbolic names, bounds are often not as obvious as one might think; explain how formulas are derived. Also, if you use symbols alone, you can do a quick dimensional analysis as a sanity check.
Be brief. Occasionally, some of you lapsed into verbosity, most often during the analysis of scalability. When the going gets tough, be straightforward and brief, and resist the urge to pad. Similarly, when the content is self-explanatory (like many of your fully-worked examples), don't belabor the point. Like most readers, our eyes glaze over and we start skipping when things get repetitive and vague.
Say it once. Try not to repeat the same idea throughout the paper. When you organize the report before you write it, think about what the key ideas are and where they will be explained. Consider dependences too. In several reports, a feature was introduced and only explained several pages later. At the very least, note that the feature will be explained, and include a forward reference; it's disconcerting otherwise. For example, one of you included the source id in the temperature packet structure, but it wasn't until a parenthetical comment 5 pages later that it became clear that this id was used to detect topology changes.
Some nitpicks. Don't break figures across pages; check pagination before you print and add some page breaks if necessary. Similarly, indent and format pseudocode as you would actual source; don't let the word processor wrap it like prose. Don't repeat irrelevant details or constraints from the project assignment just because they were specified. Number your pages; if the pages of your report are not numbered, it makes it hard for the reader to reassemble it in correct order after dropping it on the floor. And the reader can't easily make comments such as "back on page 5 you contradicted this claim." Most popular spelling errors were "psuedocode" (for "pseudocode"), and "it's" for the possessive adjective.

Design Issues

Most of you reviewed the requirements and prioritized them well. But bear in mind that it rarely makes sense to use terms like "as fast as possible". Surely there is no point in providing temperature updates more often than 10 times a second, e.g.

Protocol Styles

You came up with an imaginative collection of schemes. These included: a broadcast scheme with no routing; a standard routing table constructed with advertisements; source-routing in which the packet carries its route, and each node holds only the route to its friend; and a hybrid, loose-source routing scheme in which nodes on a route contain forwarding links. There were also some rather unusual schemes: a hierarchical address space with an elaborate setup protocol to determine higher-level routers; schemes based on Cartesian coordinates; schemes in which a leader is elected and packets are sent radially into the leader and out (often by forming a tree-like structure); and a particularly remarkable baton-passing algorithm: The holder of the baton floods the network (with back-propagation prevented by a serial number in the packet) and then passes the baton to someone else in a repeated depth-search tree-walk. Each successive flood (in the absence of lost packets) is contained inside the previous flood. The serial number thus can be passed along in the baton, so the cache of previously seen packets needs to remember only that one serial number. This scheme uses memory that grows with log(N)--the size of the serial number.

A few of you designed schemes that worked only on a grid. The FAQ stated clearly that your design, while possibly optimized for a grid, should work in a more general sense. Some of you did manage to leverage this to your advantage, however. Unfortunately, others of you designed schemes that attempted to deduce whether nodes were on an edge or corner by listening for neighbors. Unless this is done with care, lost packets can cause internal nodes to mistakenly identify themselves as edges. Grid-based designs weren't the only ones to dissolve into chaos in the face of lost packets, however!

A lot of you decided to avoid the lost packet issue and many others by simply flooding temperature messages, which, in this constrained system, turned out not to be such a bad idea, as long as you had a way to ensure messages were removed from the network. A bad idea was just to use TTLs, as packets will ping-pong between each pair of beanies, duplicating themselves exponentially. A similarly bad idea was to only prevent cycling of packets: there are an exponential number of non-cyclic paths a packet could take before hitting its destination.

The best way to avoid exponential buildup is to keep a cache of recently seen packets, and refuse to forward those you've already seen. But how big does the cache need to be? Everybody needs to see every packet, so safe guess is N. Some of you tried to claim a smaller one is workable. Whether or not this is so depends on whether you can bound the amount of time (in packets) between the possible arrival of the same packet. This depends on your queue lengths (see below). In addition, transmissions that are heard by only some neighbors can lead to those packets returning much later than usual. Odd topologies (e.g., N beanies in a circle) can make this effect even worse.

The correctness of a caching scheme also depends on what you record about the packet, however. Schemes that forward only one packet per Temp, Src/Dst ID, or both change the semantics of the service provided to the e2e layer. The e2e layer might want to transmit the same temperature twice, and it's not your place to forbid it. A better idea was to select some per-packet nonce, or use timestamps.

Many of you who used the standard routing table noticed that a representation that keys on destination IDs is bad, because it replicates next-hop neighbor IDs many times. Several of you added a table that allowed shorter IDs to be used for neighbors. A better solution is to organize the table around neighbor IDs, especially since computation is free in this system. This needs to be done with care, however, as several such schemes caused problems with duplicate packet detection.

For those of you using a routing table scheme with advertisements, it's better to send the table entry by entry than in one huge message. The link layer does not guarantee delivery, so sending the whole table at once will amplify the consequences of occasional packet loss. (See below for a discussion on queue lengths with large packets).

Source-based routing schemes need some way to prevent congestion from broadcast route requests. It's not good enough to just avoid loops; you need to keep some state at each node to reject recently seen messages. Many of you suggested you would examine ALL the non-cyclic paths, and chose the best one. Let's consider for a moment how many such paths exist.

As most of you pointed out, the nodes with the longest paths between them exist at the opposite corners of an n x n grid, with a shortest path of 2(n-1) hops. Neglecting longer paths (Exercise for the reader: How many non-cyclic paths are there?), there are still a large number of shortest paths. Consider moving from top left to bottom right on a grid. At each of the 2(n-1) hops, you can either move down or right, but you must make an equal number of both. Hence there are (2n-2) choose (n-1) different paths, or (2n-2)!/(n-1)!(n-1)!, which, for n=10, works out to about 48620.

So ONE seemingly harmless path discovery message sent by the corner node will result in its partner receiving almost 50,000 SHORTEST paths, not even considering the longer ones. Clearly this is not going to work. The best way to avoid this is to allow each node to forward only one (or two or three) path discovery messages per source. This prevents the exponential explosion, and, in a grid topology, results in at most 4 (or 8 or 12) path choices at the destination.

Optimizations

Many of you invented clever optimizations. Some of these were:

Realizing that because this network doesn't have to support any-to-any communication, the *only* forwarding info that any one beanie needs to remember is for paths that actually go through it. This realization can cut down the size of the forwarding tables.
Using a universal/perfect hash or a distributed agreement algorithm to reduce the size of node IDs. Good ideas, but unless you first collect the entire set of IDs, you can't a priori construct a perfect hash with any probability unless the ID space is considerably larger than the number of beanies. (See below.) Some of you constructed schemes in which collisions only negatively impacted performance, but not correctness.
Breaking symmetry by comparing node IDs within pairs and having them play different roles. This may reduce network load during route discovery by up to 50%.
Only including the source OR destination in a packet, but not both.
Using short, node-local IDs for neighbors in source route packets, with each node selecting different identifiers for its neighbors

Some of your optimizations were not so good:

Some of you had initialization phases to discover path lengths, but then didn't show how you used this to improve performance of later phases.
Forwarding packets in a non-directional fashion (i.e., couldn't tell/didn't care which beanie of the pair a packet was going to/coming from). Packets will propagate in both directions at once, possibly even cycling (and never leaving the network) without a TTL.
Some of you were so concerned about packing memory that you forgot to save enough memory to build/process your packets, or for temporary variables. You can't sort a list without SOME scratch space.
As opposed to hashing, some of you just grabbed some small number of bits from the ID. Unless you made some stated assumption about the uniformity of low/high order bits over a geographic distribution channel, this could be in big trouble.
Similarly, some of you picked a random number from a space on the order of the number of beanies. The birthday paradox says you need a space AT LEAST N^2, and then you still have a 50/50 chance of a collision. And you don't know a priori how many beanies there will be.

One optimization that has both pros and cons is using packet length to distinguish packet types. While the unlimited CPU and severely constrained bandwidth may make this a good idea in this particular case, the savings does not, in general, merit the added complexity and maintainability problem; one of you even padded packets with 16 bits to get a different length! Using a uniform packet size and putting dummy values in id fields for some packet types is also a bad idea. It makes the code more complicated, and it consumes valuable bandwidth.

Timing

Finally, many of you struggled with timing, particularly how to move between initialization and normal forwarding modes. Many of you avoided this problem by making the strong assumption that somebody would restart the beanies at the beginning of the hour. While a severe restriction, this was better than those of you that assumed the Beanies had wall-clock time and could do it themselves. (Even if you synced them all at the factory, there WILL be drift). The best schemes adaptively moved into and out of discovery phases, which also nicely handles beanie mobility in many cases. A very clever use of the Beanies timer was to leverage it to remove link-layer queuing from the mix entirely by simply storing all packets in your own memory (the infinitely fast CPU can process incoming packets as they arrive), and proceeding through them in some order, marking them as sent as you dispatched them, and timing the dispatch based on packet length.

Analysis Issues

The most common mistakes in the presentation of the analysis were pulling numbers out of thin air, and presenting formulas full of numbers with minimal explanation. Use symbolic names as much as possible. That doesn't mean you need to solve the resulting equation; some of you showed nice tables of possible parameter settings determined by trial and error.

Most of you could do much better explaining the assumptions that underlie your reasoning. The analysis is very tricky, because of concurrency, queuing, unpredictability, etc. So you have to make some reasonable assumptions.

Many of you had a difficult time reasoning about queue lengths in your network, and blindly assumed they wouldn't be a problem (often without even stating you were ignoring them in your analysis), without any feel for how big queues can get. The most common flaw was to assume that each node could be analyzed in isolation, ignoring the costs of forwarding, for example, for other nodes. Many of you also ignored the trickiest phases or aspects of your scheme (and the ones most likely to cause performance problems!).

Let's consider a flooding protocol for the moment, since almost all of you did some flooding, either to forward packets or to discover routes, and see just how big a queue might become. Consider the portion of a 10x10 grid shown below, and assume every node has one packet to send at time 0. Let's focus on the central node, 0, and consider how many packets it has to forward. It receives 4 packets at each timestep, and sends out only one. So, at the end of k timesteps, the node has 3k packets in its queue.

That's not so bad... but how many are still coming? There are (\sum_{n=1}{k} 4n) - 3k still on their way! After 4 rounds, that's 37 packets that are queued up to be sent out by the central node (either at the central node itself, or on their way in from neighbors). Doh! So we still have to wait for those packets to drain (and make their way through the remainder of the grid) before we can consider the packets sent at time 0 to be fully dispersed.

        4
      4 3 4
    4 3 2 3 4
  4 3 2 1 2 3 4
4 3 2 1 O 1 2 3 4
  4 3 2 1 2 3 4
    4 3 2 3 4
      4 3 4
        4

As you can see, even in this simple case, arguments involving queue length are very subtle and error prone, so it's much better to select parameters that you can argue will cause little or no queuing on average. The easiest way to do the analysis, and to avoid having to reason about congestion, is to calculate the aggregate load of the entire network, and then split it across nodes.

For example, suppose you want to estimate for a broadcast scheme how often to transmit data, and you're using a scheme in which nodes drop packets they've already seen. Then a single packet broadcast results in the transmission of the packet by each node. If all n nodes send a packet, there will be fewer than n^2 packets sent. With a packet size of p and transmission rate of r, these packets can be sent by n nodes in p.n^2 / n.r = p.n/r. Setting r=1000 bits/sec, p=50 bits, n=100, we get 5s. So if we introduce packets no more often than once every 5s, queues will remain short and the network will not get congested.

(You may be concerned that the last packets to leave the longest queue will need additional time to traverse the remainder of the network. This isn't a problem in a grid topology. Since we're flooding, and every packet reaches every node, these packets will have already been seen by adjacent nodes, by virtue of the fact the are the last packets in the longest queue.)

While most of you need only be concerned about queue length to determine packet inter-arrival times, those of you with very long packets (say route discovery packets with a train of 4 byte IDs) also need be concerned about overflowing the queues. In a grid formation, a node can only receive 4 new packets at a time; assuming it sends one at the same time, this results in a queue growth of 3 new packets. With a path length of 18, say (the SHORTEST non-cyclic path for nodes at the corners of a 10x10 grid), your packets may approach 80 bytes. While the link layer had a few hundred bytes of queuing, that only holds about a dozen of these packets. As you can see above, the central node's queue may begin to overflow after 4 rounds, causing packet loss. This is likely not a problem since packets are duplicated, but worth mentioning.

Many people ignored some aspect of their system in the analysis, and attempted to compensate for this omission by adjusting the result with a constant factor. For instance, some people calculated the worst-case message delay ignoring queuing, and then multiplied the resulting delay by k, where k was typically a small integer greater than 1, stating that the effect of queuing couldn't possibly affect the delay by more than a factor k. It should be clear that in the absence of adequate flood control, the effect of queuing could result in virtually unbounded delay.

Go to 6.033 Home Page Questions or Comments: 6.033-staff@mit.edu