6.033 Spring 2005: Design Project 2

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

6.033 COMPUTER SYSTEMS ENGINEERING, SPRING 2005


Design Project 2: AllTime, Inc.



For groups of exactly 3 collaborating students, enrolled in recitations taught by the same instructor.

Last updated: $Date: 2005/05/01 22:55:37 $. See the DP2 FAQ, DP2 Errata, DP2 Discussion Forum.


Due Dates

The deliverables for DP2 are as follows: More information about the format of the design report is given below. The format of the oral presentations will be covered in class closer to the due date.


1 Introduction

Your company, AllTime, has decided there is commercial value in providing an archive for old versions of web pages. Your boss has assigned your team the job of designing this archive. You got this job because when you were hired you bragged about taking 6.033 and learning all about networking, fault-tolerance, replication, and transactions.

The goal of the archive is to allow users to ``browse in the past.'' A user gets to select a particular date and time to do this browsing. Then he can attempt to fetch a page from the archive as of that date and time. If the archive has that page, it must provide the version that existed then. Of course that page may contain links to other pages, and the user can continue to browse in the past by accessing them. Thus the user can get a snapshot of what existed at that date and time across a set of web pages.1

Not every web server has pages in the archive; only registered servers do. Registered servers upload versions of pages to the archive as modifications are made.

The AllTime service is only responsible for providing historical versions of pages. It does not provide current pages; in fact it is not required to provide service when users request very recent information (e.g., a date and time within five minutes of the current time). Therefore when users want to view the current version of a page, they will connect to the original server and download the page just as they do now.

Your plan is to implement the archive on a very large collection (about $10^4$) of archive machines. Figure 1 shows the overall system architecture. It shows a collection of archivers that store the historical archive content. The archivers are used by web servers to store their pages, and by web clients that browse pages. Clients don't access the archive directly; instead they make use of frontend web servers (FWSs) that access the archive on their behalf.

Your job is to design the archive service that runs on this collection of machines. This design includes:

Figure 1: The architecture of the AllTime web archive service. There are three entities in the archival system: web servers and the archive publishing utility (APU) used to publish content; Front-end web servers (FWS); and archivers.
\includegraphics[width=6.0in]{arch.eps}


2 Service Requirements

In this section, we describe the features required of the AllTime service in more detail. In your design report, be sure you have describe clearly and completely how your system provides all of these features.


2.1 Basic service

The primary goal of the archive service is to store old pages. Though it may store recent versions of pages, it is not required to have the most up to date versions of current pages. Hence, users interested in the most recent version of a page should go directly to the web server.

The archive is responsible only for static content; you need not worry about how to store dynamically generated content (e.g., coming from databases via CGI scripts or servlets.)

The archive provides a service for both web servers and clients. For web servers, it stores old versions of pages. Web sites thus need not store old versions locally. For web clients, the archive allows them to view pages as of some time in the past. This browsing is supported even from pages of web sites that go out of business; any archived versions of pages from such a web site will still be available in the archive.

The archive service guarantees that it will store the old pages and make them available on demand. However, it doesn't store pages for every site on the Web. Rather it does this only for ``registered sites''. Note that this implies that clients' browsing requests may sometimes fail, because the page of interest comes from an unregistered web site.

When a server from a registered site creates a new version of a page, it will send a copy of the page to the archive. This copy may reflect the state of the page either just before or just after the changes were made for the new version--part of your design is deciding when changes are published to the archive. Along with every page sent to the archive, web servers include an associated date and time (the ``timestamp''), that indicates the time range during which this copy is valid (the exact interpretation of the timestamp will also depend on when versions are sent to the archive.) The archive will accept time values for timestamps that are ``slightly'' earlier or later than its current time (i.e., the clocks at the server and archive do not have to be synchronized). Web page authors decide when to ``publish'' a collection of changes to pages as a new version.

The archive provides functions to allow a web server to register with the archive; web servers must do this before submitting any pages to the archive. Web servers may also indicate that a page was created as of a particular time, or that it has been deleted as of a particular time. Earlier versions of a deleted page should still be accessible from the archive, but requests for the page after the deletion time should return a page not found error.

A client can read pages from the archive by giving a URL and a date and time. (Exactly how pages in the archive are named is up to you). The archive should return the version of this page that was valid as of the user-specified time, or an error if no such version exists.

An archived page may contain URLs, such links to pictures, movies, and other pages. When the user loads these resources, they should also be displayed as of the user-specified date and time. Note that this time is not necessarily the same as the timestamp of the page referred to by the URL originally specified by the user. For example, suppose that a user would like to see http://abcdef.com/index.html as of 12:25 PM on 2/14/04. Further suppose that this page refers to two images, http://abcdef.com/img1.jpg and http://abcdef.com/img2.jpg. The behavior of the system should be to load the newest version of each of these resources that is older than (or has the same time as) 12:25 PM on 2/14/04. It is possible that the retrieved version of http://abcdef.com/index.htmlwas last updated on 2/10/04, whereas the retrieved versions of img1 and img2 may have been updated more recently on 2/12/04. Nevertheless, links followed from the index.html page should be loaded as of 2/14/04, not 2/10/04.

You should assume that AllTime service contains 100,000 ($10^4$) servers, each with a 100 GB ($\sim 10^{11}$ byte) disk, for a total of 10 petabytes (1 petabyte is $\sim 10^{15}$ bytes) of storage. You should assume that there are $10^8$ pages stored in the service, and that each page has on average 100 versions (some may have many more), and that the average page is 100 KB ($\sim 10^5$ bytes) (some will be much larger). Thus, on average, each server must store about $\frac{10^8 \times 100 \times
10^5}{10^5} \sim = 10^{10}$ bytes, or $\sim 10$GB.


2.2 Registration

Servers need to inform the archive of their status: they register to indicate that they will henceforth be adding pages to the archive. They deregister to indicate that they will no longer be participating in the archive. Registration provides the DNS name of the web server; once a server is registered it is allowed to store web pages whose URLs start with its DNS name.

The archive will continue to store old pages for web servers after they have deregistered. Client requests with timestamps earlier than the last version registered before the deregistration time will return archived pages. However, client requests with timestamps greater than the deregistration time will return page not found errors.

You will need to describe the protocol for registering and deregistering.


2.3 Renaming

Your design must allow a web server to change its DNS name, meaning that its pages continue to exist, but the DNS name at the beginning of the page URLs is now different. The old archived pages of the renamed web server must still be available through the archive under their old name, and in addition the server will be able to add new pages to the archive under the new one. You must also support the case where a web server deregisters and a new server registers with the same DNS name (e.g., in 1999, http://pets.com/ was owned by a small startup company that sold pet supplies on-line; it is now owned by a large pet company called PetSmart.) Of course, may not necessarily deregister; in this case, their name may not be reused.


2.4 Fault Tolerance

The archive is stored on a very large number of archivers and your design must tolerate failures. At any moment some archivers will have failed or be unresponsive. The system you design should almost always be able to serve archived pages in spite of failures. One way to accomplish this is to replicate pages on several archivers; if you choose to replicate data, you will need to think about how users locate different replicas of a page and how different replicas are kept consistent with each other.

You will also need to deal with the possibility of web servers crashing in the middle of publishing a new version. This issue is discussed in more detail in Section 3.4.

When a server crashes, you should assume that it immediately stops processing requests and then restarts sometime later. The probability of an individual archiver being offline is low (e.g., the MTBF of a given archiver is several years and archivers recover with all persistent state intact, usually within a few hours of failure). You should design a system where requests to the archive always complete quickly, even when some archivers have failed, assuming the network is connected; if the network doesn't allow communication with the archivers, the requests will succeed after the communication resumes. You may assume that any two different archivers fail independently. You do not need to worry about other types of failures - e.g., file system corruption and memory faults are not an issue in AllTime. Note that an archiver that fails may have missed publication or registration requests; you will need to ensure that your design continues to function properly despite these lost requests.


2.5 Security

Your design needs to provide clients with a guarantee that a web page fetched from the archive was actually put there by the web server that is responsible for that page. Your primary concern should be preventing malicious users from being able publish a URL which they are not authorized to publish (e.g., a rogue 6.033 student shouldn't be able to publish a new version of the 6.033 web page claiming that class is canceled tomorrow.) You do not need to address the problem of a server trying to overwhelm the archive service by storing a huge number of pages.

In particular, your design should address the following issues:

  1. How do you prevent a malicious user from pretending to represent a web site that he or she does not own and storing bogus web pages for that site?

  2. How do you detect when a malicious intruder has taken over one of the archivers and tried to send content to clients that is corrupted or invalid?

Your design should include a description of the protocols for authenticating users and their requests that address these issues. You can assume the existence of a certificate authority. You are only required to provide data integrity when at most one archive server has been compromised. You may assume the the FWS is never compromised.


2.6 Performance

If a client wants to read an archived URL (e.g., http://www.abcdef.com/img1.jpg), and that page is in the archive, it should be possible to obtain that page by contacting a small number of archivers in most cases.

Your service should also be able to locate a particular version of a page from the archive by reading less than 100 disk pages in total from all archive servers, even if there are thousands of versions of a given page. You may need to use additional I/Os to read very large pages once they have been located.


3 Issues to Address

This section describes the issues you must be sure to address in your report and describes some basic software building blocks that you may find useful in your design. You are not required to use these building blocks, but if you choose to use something else, be sure to carefully justify why you are using it and to describe how it is implemented.

You will need to design the software that runs at each of the three components in the AllTime system (see Figure 1). These components are: the archivers, the APU (the publishing tool that runs on the web server machines); and the FWS (the frontend web server used by clients to interact with the archive).

You may assume the existence of a special RPC package that you can use for communication among the components of your system. This package is able to handle even very large messages (e.g., containing large files) efficiently.


3.1 Archive Management

The archive service will run on a very large number of archivers. Each archiver has a unique Internet address (an Internet or IP address names the network attachment points of the internet.) At any given time, some of these archivers may have failed.

One way you may wish to address these failures is through replication. To simplify the task of assigning choosing a set of replicas for a particular web page or web site, AllTime has hired a contract programmer to design a module that accepts a string (representing a URL or site name, for example) and returns a set of $k$ IP addresses corresponding to archivers. This module provides one function, mapAMs, as follows:

IPAddr[] mapAMs(String name, int k);

You may assume that for a given unique name, the set of archivers returned are randomly and uniformly selected (without replacement) from all available archivers. If the mapAMs(...) function is called with the same parameters, it will always return the same set of IP addresses. If, however, the name parameter is different even by a single character, the set of archivers returned may be completely different. Though you do not need to worry about how this function is implemented, it is fairly straightforward to implement such a function using hashing; reference [2] at the end of this document provides a citation describing how a system that uses a similar technique works in practice.

You should use the mapAMs() function to determine which archivers store a given page. You will need to decide what names to use; for example, you might use the DNS name of an entire web server for a name, or you might use the URL of the page, or even the URL of the page concatenated with that version's timestamp.


3.2 Archivers

A major part of your design concerns how to organize the archivers. You will need to decide where to store pages from different web servers, how to use the mapAMs() function effectively, and how to handle the various registration commands.

You will also need to define the interface provided by the archivers to clients and web servers. Additionally you may need to define an interface that archivers use to communicate with one another.

In particular, you must describe (using figures, protocols diagrams, and/or pseudocode):

  1. How pages are stored in the archive, including a replication scheme should you choose to use one.

  2. Whether the archive contains the most recent version of a page, or whether web servers only publish a version to the archive just before that version is overwritten.

  3. How clients look up a version of a page as of a given time, including how their lookup request is directed to the appropriate machine.

  4. How a given archiver stores the pages it is responsible for and how it locates a given page in its local file system.

  5. What happens if one or more machines in the archive fails and how you continue to make the pages stored at that machine available to clients.

  6. How servers publish new versions of a page or set of pages to the archive, and how those pages are spread across different machines in the archive.

  7. How web servers register and deregister and rename themselves, and how those servers are authenticated by the different machines in the archive.

  8. How servers add newly created pages to the archive, and how they notify the archive about page deletions.

Recall that the archivers can reject requests to read content if the timestamp is too close to the present. They may also need to reject requests for other reasons, e.g., no such page exists, or page update at that date and time is in progress.


3.3 Clients and the FWS

AllTime makes the archive available to clients by providing a front-end web server (FWS). When clients click on http://www.AllTime.com, they will obtain an interface that allows them to browse in the past. In particular they will be able to enter URLs of pages of interest and to indicate the date and time at which they wish to browse.

Your job is to design the code that runs in the FWS and carries out the client requests. In particular, you should address the following questions:

  1. How an FWS directs requests for a particular archived URL and date and time to the appropriate archivers.

  2. How an archiver responds to these requests.

  3. How the FWS ensures that it has obtained the correct version of the page (assuming one exists), i.e., the one corresponding to the date and time of interest.

  4. How links in pages returned to the client are structured to ensure that when the user clicks on a link in the returned page, she is directed to the appropriate historical page in the archive. Note that this may require you to change the links in pages stored by the archive or returned to the user so that no modifications are needed to the user's web browser.

  5. How users determine that a page returned to them by the service is authentic (i.e., was actually produced by the web server that is registered as the owner of the site with the AllTime service.)

Remember that the archive provides only static content; you need not be concerned with dynamically generated content (e.g., data from databases stored on the web servers, or generated by CGIs or servlets.) You also do not need to worry about structured links that are generated by scripts embedded in the pages returned to users (e.g., some web pages construct links via javascripts; it is OK if these links don't point to historical pages.)

Of course if there were just one FWS, it would be a bottleneck and a single point of failure. You should assume that there is a set of FWSs that receive client requests and send pages to the clients--enough to handle all client requests. Furthermore you can assume that client machines direct their requests to these front-end servers in a way that balances load; for example, they might use Round Robin DNS Load Balancing (see Reference [1]).

Though the appearance and presentation of pages to users is of course very important to the commercial success of the AllTime service, you do not need to describe the user interface or design the appearance of the front-end web service for the purposes of this design project. (You can look at http://web.archive.org/ for an example of a UI for an archive service, the ``Internet Archive.'' Note, however, that the Internet Archive has a very different model for archiving since it crawls the web, whereas in the AllTime archive, web servers push their pages to the archive.)


3.4 Web Servers

Web servers need to register with the archive and put pages into the archive as they are modified. Additionally, a web server can deregister (meaning that it will no longer be storing pages in the archive), change its name, add a new page to the archive, or indicate that it is deleting a given page.

A web server that is using the archive runs an archive publishing utility (APU), which is simply a local process that is responsible for interfacing with the archive service. There is just one APU running on a web server.

The web server informs the APU about modifications to local files, and the APU's job is to communicate with the archive service to archive the pages correctly.

Figure 2: Proposed APU file modification API.
start(Timestamp ts, String dnsName)

modify(URL u, String newFile, String oldFile)

delete(URL u)

done( )

If desired, you may assume that web servers notify the APU of file modifications via the API shown in Figure 2.

You may assume that the web server only has one active ``session'' at a time; if it calls start twice without a done in between this is an error (the APU can return an error response).

The web server will not make the new versions of the pages visible to users until after the call to done completes. By that time, all the archive activity for these pages must be complete.

The APU is required to carry out a set of modifications atomically. At the very least it must ensure (1) when it returns from the call to done all pages have been properly written to the archive; (2) if the web server fails prior to calling done, no pages are visible to users of the archive; and (3) If the APU crashes during the call to done, the pages may or may not be visible, depending on whether the implementation of done() has reached its commit point or not.

You should describe what the APU does in response to these calls. In particular, be sure to explain how it satisfies the atomicity constraint, and also how it ensures that pages are written to the archive in a way that guarantees that the correct version can be obtained by a reader later on. In your description of done(), be sure to clearly indicate the commit point as well as how the APU ensures that a partially completed call to done() is completely undone or redone. Be sure to answer the following questions in your writeup:

  1. When are versions published to the archive: after the call to modify, or as a batch when the web server calls done?

  2. How does a change propagate to all of the archivers that are responsible for storing a given page or version of a page?

  3. What if one of these archivers has failed?

  4. What happens if the APU's web server fails part way through storing a bunch of pages to the archive and then recovers later?

  5. What happens if the updating web server never recovers?

  6. How does the web server handle relative path names (e.g., ``../img.jpg'') in pages it is submitting to the archive?

To answer these questions, you may want to think about the different recovery and isolation techniques we have studied in class, including transactions, logging, version histories, and two-phase commit.

You may assume the existence of a log utility to handle logging if you think this is needed. The log utility provides methods to append a string to the log, return the size of the log (the number of strings it contains), and to read the $i$th string from the log.

In general, it is acceptable if the different archivers that act as replicas of a given page are inconsistent with one another. However, you should argue that your design guarantees that the archive will provide a complete and correct view of the versions sent to it by a web site once the done call returns, and assuming that the user query is not too recent (e.g., is older than five minutes ago).

You may also assume that the web server notifies the APU of registration events through the API shown in Figure 3.

Figure 3: Proposed APU registration API.
register(String dnsName)

deregister(String dnsName)

rename(String oldDnsName, String newDnsName)

You must explain what the APU does in response to these calls and how these calls are forwarded on to the appropriate machines in the archive service.

You may also need to introduce additional calls to the APU, e.g., a call to create a new page or publish an initial version of all the pages when registration happens.

The APU may use the mapAMs() function to decide what archive servers a particular update should be sent to.

You can assume the existence of a kernel call that can be used to find out the current date and time.

Finally, you can assume that there is a sufficient number of archive servers to handle the update load generated by web servers; you should concern yourself more with the ability of the archive to handle client requests than server updates.

In your write up, you should be sure you have carefully described the protocol used to publish new versions of pages to the archive, including how the APU decides where to store a given page and how groups of pages are installed in the archive. You will need to to use protocol diagrams, figures, and/or pseudocode.


4 Additional questions

Consider the following additional questions and address them in your design report. These are not additional design requirements and we are not looking for a particular answer to these questions, but you should be sure to justify why your design behaves the way it does.

  1. What happens if a web server fails before notifying the archive service that it is deregistering?

  2. What happens if a web server deregisters and then later re-registers?

  3. What happens if a web server ceases to function, and later another web server starts running with the same DNS name? Note that a similar issue arises if the first web server changed its DNS name, and now its old name is being reused.

  4. What happens if the archive receives a read request whose date and time are very close to the present?

  5. Suppose that that one historical web site is very popular and a large fraction (say, 20%) of the total requests to the AllTime service are for pages from this site. What affect does this have on your system? Will one or a small number of servers be overloaded by this traffic?

  6. Changing the past. We might like to give a web server the ability to revise an archived page or even to withdraw a page. Explain how this could be accomplished. Is this a good idea?

  7. What kind of synchronization issues exist in your design? In particular can you always guarantee that the page provided by the archive server is the one valid at the time in question?

5 Your Written Work

This section provides some suggestions on writing style and outlines the standard structure of a design report.

5.1 Who is the audience for this paper?

Write for an audience with the technical expertise of colleagues who took 6.033 five years ago. These are readers who understand the underlying system and network concepts and have a fair amount of experience applying those principles, but they have not thought carefully about the particular problem you are dealing with. You need to provide this kind of reader both a high-level ``gist'' of your approach as well as explain why you make certain design choices.

Assume that your paper will also be used to convince readers that you have a feasible design. One qualitative way that 6.033 reports are evaluated is by asking the question, ``Do we want this person on our team? Can this designer provide us an accurate description of his/her design?''

5.2 How do I organize my DP?

The following is a suggested outline for your report. You don't have to follow this organization, if you think your design would be better described some other way. The full report (including abstract) should be no longer than 5000 words long. Please choose a font size and line spacing that will be easy on our aging eyes and make it easy for us to write comments on your report (e.g., 11 pt font or larger)

Title Page

Give your design report a title that reflects the subject and scope of your project. Include your name, recitation time and section, and the date on the title page.

Abstract

The abstract is a short summary of your design proposal. The abstract is not an outline of the organization of the paper! Instead, the abstract states the problem to be addressed (in one sentence), the essential points of your solution (without any detailed elaboration) and announces any conclusions you have drawn. Abstracts are normally about 200-ish words long. Tip: Write the abstract after you have written your report. You'll find it much easier!

Table of Contents

1. Design Overview (aka., the Introduction)

The Design Overview states the design purpose, lists design considerations (what it is, goals, and constraints) and summarizes the salient aspects of your design. You may need to include a figure of your design architecture. You may assume that the reader has read the DP1 assignment, so you do not need to restate the problem in detail. Your Design Overview will probably be no longer than 1 page (figure included).

2. Design Description

Explain and elaborate your solution. Show how your solution satisfies the constraints and solves the problem (or how it fails to do so). Explain how your design choices are reasonable or desirable in the context of the problem statement. Include analysis of the estimated performance characteristics of your solution. You will almost certainly need to use figures or pseudocode in this section. If you use pseudocode to illustrate your solution, be sure to describe what the pseudocode does in English as well in the text.

At some point in your Design Description (or alternately in a Feasibility section), you should describe the alternative approaches that you considered and rejected, and why you rejected those approaches (i.e., trade-offs). Your paper will be more convincing if you explain why your design is appropriate and why your design is better than the alternatives. (For example, if another approach would meet all of the objectives perfectly but the cost would be 100 times higher, then you should mention the exorbitant cost as a reason for choosing your less general but cheaper approach.) Comparisons with another design should not assume an incompetent version of the other design, but rather one that has had at least as much careful optimization as your design.

Most of your report will be your design description. Give readers lots of specific details here and organize the Design Description into subsections.

3. Conclusion

The Conclusion summarizes the design problems that you solved, identifies problems in your design, and justify why your design does not address these problems. Your conclusion only needs to be 1 paragraph.

References

Document your sources, giving a list of all references (including personal communications). The style of your citations (references) and bibliography should be in the format described in the Mayfield Handbook's explanation of the IEEE style (see https://web.mit.edu/course/21/21.guide/www/home.htm). Personal communication belongs in the Acknowledgments section while written sources belong in the References section.

5.3 Where can I find more tips about writing?

You can find other helpful suggestions on writing this kind of report in the 6.033 online lecture ``How to Write Design Reports'' (see http://web.mit.edu/6.033/www/dp1/howtowrite.pdf). You may also want to look at the Mayfield Handbook's explanation of IEEE documentation style. A good book on writing style is: ``The Elements of Style,'' by William Strunk Jr. and E. B. White, Third Ed., MacMillan Publishing Co., New York, NY, 1979. (Also available from the MIT libraries.)

5.4 How will my DP be evaluated?

When evaluating your report, your recitation instructor will look at both content and writing and assign a letter grade.

Some content considerations:

Some writing considerations:

Bibliography

  1. http://content.websitegear.com/article/load_balance_dns.htm.

  2. KARGER, D., LEHMAN, E., LEIGHTON, T., LEVINE, M., LEWIN, D., AND PANIGRAPHY, R.
    Consistent Hashing and Random Trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the ACM Symposium on the Theory of Computation, 1997 (El Paso, TX), pp. 654-663.



Footnotes

... pages.1
Throughout this document we use the term ``page'' to refer to a resource referenced by a URL; thus, pages may be HTML content or they may be images, movies, or other data types.