6.033 - Computer System Engineering (Spring 2006)

February 21, 2006

Please find additional clarifications and details in the DP1 FAQ.

6.033 Design Project 1: A Fast but Potentially Unreliable File System

I. Assignment

There are two deliverables for Design Project 1:

  1. Two copies of a design proposal not exceeding 800 words, due on Tuesday, March 7, 2006.
  2. Two copies of a design report not exceeding 2,500 words, due on Thursday, March 23, 2006.

A design report is a different beast from a quiz. As in real life, the 6.033 design projects are underspecified, and it is your job to complete the specification in a sensible way given the overall requirements of the project. As with designs in practice, the requirements often need some adjustment as the design is fleshed out. We strongly recommend that you get started early so that you can iterate your design. A good design is likely to take more than just a few days to put together.

II. The Problem

Your goal is to build a fast file system for a machine that will be used to store files for clients temporarily. One could use such a machine, for example, to cache requested web pages at the edges of the Internet, or to hold files containing data collected from sensors. Clients can use a web server running on the machine to upload, download, and delete files. You can anticipate a range of file sizes from small (e.g., a file with temperature readings) to large (e.g., a file containing a video clip). The web server is the only application running on the machine and the only application your design will need to support.

Most file systems place a high priority on never losing data. In the envisioned usage scenarios for your file system, however, reliability is less of a concern. Specifically, it is okay to lose files (or even all of the data in the file system) if the machine crashes because of a power failure, hardware failure, or software implementation error. In the absence of such crashes, however, the file system should allow clients to retrieve previously stored files that are still resident.

The disk in the machine will have a raw capacity of at least 120 Gigabytes. Under reasonable workloads, your file system should be able to upload (from clients) and store at least 75% of the capacity of the disk. So, with a 120 Gigabyte disk, the file system should be able to store (and serve back to clients) at least 90 Gigabytes worth of downloaded files. Note that it is possible for clients to attempt to upload new files when the file system is already at capacity. You need to come up a reasonable policy for dealing with such uploads.

The web server is currently coded to use the standard Unix file system calls read(), write(), open(), close(), and unlink(). You do not need to support any other calls besides these. You are allowed to change the interface between the web server and the file system, but you should have good reasons for doing so. If you change it, the web server may need to be updated, which would require additional engineering time. It is probably best to solve the problem with the constraint that the read(), write(), open(), close(), and unlink() interface cannot be changed, but everything below this interface can be changed.

The web server does not currently support hierarchical directories (all the files are currently stored in one directory). You should decide whether your design will support directories or not. A potential advantage of supporting directories is better support for future system evolution and modifications. A potential advantage of not supporting directories is that your design may be simpler. You need to make a reasonable choice between supporting and not supporting directories.

Appendix 2-A of the notes outlines the standard Unix file system interface and implementation. You may wish to read this Appendix for ideas. The Appendix does not, however, focus on where files are placed on disk, which may be an important issue in this design project. Note that you are in no way constrained or even necessarily encouraged to use the Unix file system approach in the Appendix. We can imagine very successful approaches that are radically different. In particular, you should probably think carefully about how complex your solution needs to be.

Many researchers have looked into the problem of how to make file systems fast. However, almost all of these researchers also had to make the file system reliable in the face of crashes. Because your file system does not need this kind of reliability, your job as a designer may be easier and your resulting design may (or may not) be quite different.

III. Your Design Report

Here are some items we will be interested in seeing in your design report:

Both the Memory and Disk Usage Analysis and the Performance Analysis should include a quantitative analysis. This analysis should address at least the following three workloads:

For the purposes of this assignment, a small file is less than 1 Kilobyte, and large files are at least 1 Megabyte. Your design should support files as small as 1 byte. If your design imposes a maximum file size, make sure that file size is reasonable.

IV. Your Design Proposal

The design proposal should be a concise summary of your design choices and of the overall system design. It should include the file system behavior, file system interface, and file system configuration figure described above. It should also include a specification of any limitations (such as the maximum file size and number of files) that your design imposes. You should also include an overview of the rationale behind any key design decisions and some idea of how many disk accesses it will take to read or write a typical file. You should have a complete design but do not have to present a detailed rationale or analysis. Also, if any of your design decisions is "unusual" (particularly creative, experimental, risky, or specifically warned against in the assignment), it would be wise to describe them here.

You will receive feedback from both your TA and the Writing Program in time to adjust your final report.

V. Hardware Parameters

The relevant hardware specifications are as follows:

VI. Issues to Consider

The following are issues you should consider:

It's okay, and sometimes desirable, to say, "no, my design doesn't do that" as long as you can identify what your design can and can't do, and explain why you made the trade-offs you did. Remember: a simple, correct design that meets the requirements laid out above is more important than a complex one that is hard to understand and does a flaky job. When in doubt, leave it out!

VII. Your written work

The following are some tips on writing your design report. You can find other helpful suggestions in the 6.033 lecture "How to Write Design Reports" (scheduled for Friday March 17).

Audience: You may assume that the reader has read the DP1 assignment but has not thought carefully about this particular design problem. Give enough detail that your project can be turned over successfully to an implementation team.

Report Outline (use this organization for your report):

VIII. How do we evaluate your report?

When evaluating your report, your recitation instructor will look at both content and writing and assign a letter grade. The writing program will separately grade the proposal and report for writing and assign a letter grade.

Some content considerations:

Some writing considerations:

IX. Collaboration

This project is an individual effort. You are welcome to discuss the problem and ideas for solutions with your friends, but if you include any of their ideas in your solution you should explicitly give them credit. You must be the sole author of your report.

X. Tasks and Due Dates

Please choose a font size and line spacing that will make it easy for us to write comments on your report (e.g., 11 pt font or larger; 1.5-spaced or greater). Please use 1-inch margins.