6.033 Discussion Suggestions (Hints example)

6.033--Computer System Engineering

Suggestions for classroom discussion of:

Hints versus caches

by J. H. Saltzer, May 15, 1996.

One of Lampson's hints is to use hints. This observation seems to make the paper recursive, but more on that later. The issue addressed here is: just what is the distinction between a hint and a cache? Both are examples of a general principle of remembering the result of expensive analysis, with the prospect of not redoing it if the same question arises later.

The essence of a hint is that it may be wrong, but there is a quick way of checking its validity. A cache, in contrast, has the burden of being authoritative. So it is generally harder to implement a cache than a hint.

Here is an example: A few years ago, Raj Jain analyzed the stream of packets going through a router and noticed that about half the packets that arrive at a router port are destined for the same target address as the previous packet that arrived at that port. (He called sequences of packets intended for the same destination "packet trains" and measured their length distributions. But that is a different story.)

Since one of the expensive operations in a router is to look up the destination address in the routing table to find which output port should be used to send the packet on its way, Jain's measurement suggests that it could be worthwhile to remember the result--there is a 50% chance that the next incoming packet on that port will involve the same lookup.

We can implement this idea either as a cache or as a hint.

As a cache, we place next to the incoming port a copy of the table entry we found:

(destination address -> outgoing port)

And we should be careful to remember that whenever the routing table is updated we must invalidate or update this port cache entry. When it receives the next packet on that port, the routing software extracts its destination address, and the first step is to check the cache for that port. If the destination address is the one in the cache, the outgoing port number is taken as authoritative. If the destination address is not the one in the cache, the router does a regular table search.

As a hint, we place next to the incoming port a pointer to the table entry, which we can then forget about. When it receives the next packet on that port, the routing software extracts its destination address, and the first step is to follow the pointer to the routing table entry. If that table entry contains this destination address, the router uses the outgoing port number (which may have changed by now.) If this packet is going to a different address (perhaps because the table has since been rearranged), the entry at this location won't match, so the router does a regular table search.

The difference is that the cache needs management to keep it valid; the hint is allowed to go stale, but it can be quickly validated when you try to use it. There is an analogy between this distinction and the distinction between pessimistic and optimistic concurrency control. A cache is always valid, just as pessimistic concurrency control always prevents interference. A hint may be outdated, but we can always detect staleness and recover, just as an optimistic concurrency control system may encounter interference, but it can always detect and recover.

Finally, are Lampson's hints really hints? They certainly seem to be an example of remembering the result of an expensive analysis in the hope of not having to redo it when the same question arises later. But the proposition is debatable. Although it is certainly the case that in any given situation one of Lampson's hints may or may not be just the right thing to do, it is not usually the case that there is a quick way to decide. Thus the hints of the paper can be wrong but they lack a quick validity check, so technically they are not hints. (Though I must emphasize that the paper is still worth reading and re-reading.)

Comments and suggestions: Saltzer@mit.edu