Design Project 1: A CACHE FOR THE WORLD-WIDE WEB

Introduction. One of the features of 6.033 is that we discuss real systems, both successful and unsuccessful. To get beyond discussion and give you some direct experience with designing systems we also assign two design projects. As in the real world, these projects have (we hope) a simple high-level problem statement but when you get into the system design you will find that there are many hard choices to make. And, as in the real world, it is your job to explore, understand, and explain what the best choices are, how to reconcile sometimes conflicting goals, and how to keep the complexity of your design under control.This term, the first of these design projects relates to caching, networks, and the World-Wide Web.

The problem. MITnet connects to the rest of the Internet via NEARNET (New England Academic and Research Network,) a consortium that provides links among the various universities and research laboratories in New England, and gateways to the Internet backbone providers. NEARNET charges the Institute a fixed monthly fee for this service.

This design project is based on a problem that has not actually come up, but it easily could, so from here on the problem statement is fantasy: The people who run NEARNET are pondering the idea of changing the method of calculating the fee: they want to start basing the fee on the number of bytes transferred in and out of M. I. T. Since M. I. T. is one of the busiest network sites in New England, they suspect that this will be a big revenue generator for NEARNET. They can easily install counters in M. I. T.'s NEARNET port, so from the point of view of NEARNET administration, this proposal looks like a big win.

The Provost, who is responsible for the Institute budget, is worried about this proposal, because increased revenue for NEARNET means a bigger budget deficit for M. I. T. But since the Provost is a faculty member in Computer Science, he believes that the problem is solvable with technology. He knows that a large fraction of the traffic between M. I. T. and the rest of the Internet is for World-Wide Web pages, and that often the same Web page is requested by many different users at M. I. T. So he has asked you to design a cache that can be located somewhere inside M. I. T. and that can hold frequently--accessed Web pages from outside M. I. T.

Your job is to analyze the situation based on what you what you have learned in 6.033 and its prerequisites about the World-Wide Web, caching, and networks. Then develop a complete design for such a cache.

One potential complication is that the Laboratory for Computer Science has been encountering bottlenecks in moving traffic across the campus from Technology Square to the NEARNET gateway, and that laboratory is discussing a proposal to install a second NEARNET gateway in Tech Square.

The design project. Propose a design of something (you get to decide what---a server, a gateway feature, a protocol, a network topology, a routing strategy, a modified network browser, or some combination of these and other things) that introduces a cache that can reduce the flow of traffic for repeatedly requested WWW pages from outside M. I. T., while at the same time introducing a minimum of new problems. You do NOT have to implement the design.

Before jumping to the design stage, you should do some more reading:

If you wish, you can invoke Alta Vista and look for Web pages that have the word "cache" in them. Unfortunately, there are about 90,000 such Web pages. But maybe you can come up with a more helpful query.

Your solution should attempt to achieve the following (presumably) desirable properties:

  1. It should reduce traffic between M. I. T. and NEARNET, for Web pages.

  2. It should be transparent. That is, when you install it, no one should notice that anything has changed. (Except that some Web pages may be delivered more quickly than before.)

  3. When the original of a Web page outside M. I. T. changes, people at M. I. T. who request that page should not receive an old copy.

  4. It should not be necessary to update every Web client at M. I. T. in order for the cache to be fully effective.

It may be very difficult to completely achieve all of those desirable properties at the same time. In most system designs, trade-off and compromise is required, so you have to decide how important each desirable property is in relation to the others.

Your report. Your paper should be 5 to 10 pages in length. You should start by explaining to your intended audience the background of the problem in terms that audience can understand. Next, describe your solution and explain how well it achieves (or fails to achieve) the desirable properties, and any other properties that you notice are worth providing. Throughout the paper you should justify each of your design decisions, especially in relation to alternative decisions that you could have made. You will be more convincing if you say not only why your idea is good, but why it is better than the alternatives. (For example, if another approach would be perfect except that the cost would be infinite, you should point that out.) If you can find any statistics about network traffic to support your proposal (or to support a recommendation that a cache is a bad idea) that would be a useful addition to your report.

Write for an audience consisting of colleagues who took 6.033 five years ago. That is, they understand the underlying concepts and have a fair amount of experience applying them in various situations, but they are not familiar with the particular problem you are dealing with. Assume that your paper will also be used to convince whoever is responsible for deciding what design to use to choose your design. Finally, give enough detail that you could turn the project over to an implementor with some confidence that you won't surprised by the result.

When evaluating your report, your instructor will be looking at both content and writing...

Content considerations:

Writing considerations:

Phase Two writing considerations: If you are enrolled in the 6.033 writing practicum, you don't need to do anything special; your practicum instructor will explain how the report will get you credit for the Phase II writing requirement. If you are not enrolled in the practicum, AND you want us to forward your design project report to the writing program as your phase II writing project, please say so on the cover page, and make sure that your report is at least 8 pages long. Note also that the writing program has a rule that they accept only reports that earn a B or better from the class in which they originate. Finally, be aware that the second design project will probably be a team project, and thus much more difficult to tailor to the needs of the writing program than this one.

Schedule: Your report is due in recitation Thursday, March 21, 1996.

------------

6.033 Handout 11, issued 3/7/96