6.033 2010 Design Project 1 : Time-Travel Filesystem

Version History

I. Due Dates and Deliverables

There are three deliverables for Design Project 1:
  1. A design proposal not exceeding 800 words, due on February 23th, 2010 (before recitation).
  2. A design report not exceeding 2,500 words, due on March 18th, 2010 (before recitation).
  3. An (optional) revision of a portion of your design report to improve your grade; you must set up a meeting with your writing instructor within one week of receiving your graded DP1 to take advantage of this.

All deliverables should be submitted via the online submission site.

As with real life system designs, 6.033 design projects are under-specified, and it is your job to complete the specification in a sensible way given the overall requirements of the project. As with designs in practice, the specifications often need some adjustment as the design is fleshed out. We recommend that you start early so that you can evolve your design over time. A good design is likely to take more than just a few days to put together.

II. The Problem

Your task is to design a filesystem with support for time travel. Time travel means that the file system allows the user to view the contents of a file or directory as of some time in the past. Time travel is useful in many different applications: for example, it provides a form of backup, allowing users to recover from accidental deletions; it also allows users to compare different versions of a file over time, for example, as a series of edits are made to a document.

Your design must provide time travel support in Unix, without requiring changes to the Unix File System API. In other words, it should be possible to use applications compiled for Unix in your modified Unix with time-travel support. You will need a way for applications to access historical versions of files. We suggest that you create a special directory at the root of the file system that allows you to refer to historical versions of files, as illustrated in this example:

% pwd
/home/madden/files
% ls
file.txt        file2.txt
% cat file.txt
hello
% date
Tue Feb  9 09:15:50 EST 2010
% echo goodbye > file.txt
% echo file3 > file3.txt
% rm file2.txt
% ls
file.txt        file3.txt
% cd /time-2010-02-09-09:00:00.000/home/madden/files
% ls
file.txt        file2.txt
% cat file.txt
hello
% diff /home/madden/files/file.txt /time-2010-02-09-09:00:00.000/home/madden/files
1c1
< goodbye
---
> hello
Your design will primarily consist of modifications to the implementations of every file system operation (read, write, open, rename, link, unlink, symlink, mkdir, and possibly chdir and stat) -- see Section 2.5 (page 91) of the textbook for a detailed description of the Unix File System API. Your implementation must make it possible to travel to any point of time and see the state of a file or directory as it existed at the specified time, which is why calls like read() and stat() must be modified.

You may find that it's hard to reconstruct the contents of a file as of a time when some application was writing that file. It's acceptable, if a user travels to some time T and reads file F, to produce the contents of F as of the close() call that occurred most recently before T. The user should see any files that existed at time T, and not see any files that did not exist at time T.

III. Requirements

Your design must meet the follow requirements: Here are a few additional assumptions you can make to simplify your design:

IV. Design Proposal

The design proposal should be a concise summary (800 words) of your overall system design.

The format of your proposal should be roughly as follows: it should begin with an overview, describing the overall approach. It should then summarize your proposed design and explain how it solves the problem and meets the requirements in this document.

The core of the proposal should be the design description. This should explain the basic on-disk data structures used to represent your time travel file system and a sketch of how read and write operations will work. You may make use of bullets and tables, but must introduce them with at least 1-2 sentences of context. This section must include at least one graphic, correctly formatted with a caption and brief description.

The proposal should end with a summary of how the design meets the requirements (1-2 sentences). It should also include problems which remain to be resolved.

You do not have to present a detailed rationale or analysis in your proposal. However, if any of your design decisions are unusual (particularly creative, experimental, or risky) or if you deviate from the requirements, you should explain and justify those decisions in your proposal.

You will receive feedback on your proposal from your TA in time to adjust your final report.

V. Design Report

Your report should explain your design. It should discuss the major design decisions and tradeoffs you made, and justify your choices. It should discuss any limitations of which you are aware. You should assume that your report is being read by someone who has read this assignment and is familiar with the Unix file system and API, but has not thought carefully about this particular design problem. Give enough detail that your project can be turned over successfully to an implementation team. Your report should convince the reader that your design satisfies the requirements in Section III. Make sure your report includes the following:

V.A. Workloads

Specifically consider how your system would perform on the following three workloads:
  1. The user creates a website consisting of many small HTML files (<10 KB) at time T1, updates each of the files several times, and then travels back to time T1 and rereads each of the files in their entirety. For this workload compare the number of I/O operations (reads or writes of blocks from disk) required to perform these historical reads to the number of I/O that would be required if these were not historical reads. Also estimate the space required to store these historical versions, relative to the size of the original collection of files.
  2. The user downloads a large movie (~10 GB) from his or her camcorder at time T2, and then performs edits to several small scenes in the movie, comprising about 20% of the total disk blocks occupied by the file (the movie size stays about the same after these edits.) Compare the number of I/O operations required to perform these writes to the number that would be required in the standard Unix file system (as specified in Section 2.5 of the course textbook.) Also estimate the space required to store the historical versions of the movie, relatively to size of the original movie.
  3. Now suppose the user travels to time T2 and plays back the movie. How many I/O operations are required by your system? How many I/O operations would be required to play it back at the current time? How does that compare to the number of operations that would be required for playback of the current version in the standard Unix file system?

Assume that the cost of a sequential I/O (where the disk head is already positioned over the sector being read) is approximately the same as a random I/O (where the disk must seek to the sector being read.) Though mechanical hard drives don't actually behave this way, newer stable-storage devices, such solid-state disks based on Flash memory, do. It may not be possible to design a system with good performance in all possible scenarios: you may need to make tradeoffs, supporting some kinds of use well, and others not so well. Your should explicitly mention any such tradeoffs in your workload analysis.

V.B. Report organization

Use this organization for your report:

Here are a few tips:

The format and appearance of your report should conform to the DP1 style guide.

VI. Revising Your Report

You may revise a portion of your design project if you received a grade of 90 or less on the original report (grades are out of 100). The maximum grade you can receive through revision is 90. Before you submit your revision, you must set up an appointment with your writing instructor to agree on a plan for your revision within one week of receiving your grade on DP1. You writing instructor will work out a time when your revision is due, typically one week after your meeting.

VII. How we evaluate your work

Your recitation and writing instructors will assign your report a grade that reflects both the design itself and how well your report presents the design. These are the main grading criteria.

The most important aspect of your design is that we can understand how it works and that you have clearly addressed the requirements and provided the elements listed in Sections III and IV. Complicated designs that we cannot understand will not be graded favorably. Some overall content considerations:

Some writing considerations:

VIII. Collaboration

This project is an individual effort. You are welcome to discuss the problem and ideas for solutions with your friends, but if you include any of their ideas in your solution you should explicitly give them credit. You must be the sole author of your report.