Stage 1 | April 12 |
Lab recitation section | April 12 |
Stage 2 | April 29 |
Design paper | May 2 |
Stage 3 | May 10 |
The goal of this project is to obtain experience with file systems. To this end, you are asked to write both a simple file system (SFS), a core component of any computer system, and a disk emulation library that it uses to run on top of UNIX. Implementation of this system will provide you with experience designing complex systems (one of the major topics in 6.033). While we give you a specification for the external interfaces for SFS, you are free to design the internals as you wish.
To help you in designing your file system, this project is organized into three stages. In Stage 1, you are asked to build a disk emulation library that will emulate a disk on top of a single file. Its interface includes functions to read, write, and flush ``disk blocks''. In Stage 2, you will build a simple file system on top of the disk interface you built in Stage 1. This file system should provide efficient storage. In particular, access to all the disk blocks which are part of a single file should be efficient, and the amount of metadata (i.e., data on the disk which is not part of any file, but is used by the file system itself) should be small. In Stage 3, you are asked to make your file system robust in the face of failure (i.e., it should not corrupt data if it crashes). To test its robustness, we will crash your file system at arbitrary points in its code: your file system should be able to recover to an internally consistent state regardless of where these crashes occur.
The final result of your project is a complete document detailing the design and implementation of your SFS, the code that implements the design, and a short evaluation of your file system in terms of performance and functionality. This document can be used as the paper for the second design project in 6.033. The paper is due on May 2.
Unlike the STCP project, you are strongly encouraged to execute this project in teams of two. Find someone else to work with and send mail to Dawson Engler (engler@lcs.mit.edu), telling him who your teammate is. Make sure to do this right away. The first part of the project is due very soon (April 12).
The rest of this handout touches on some of the design and implementation issues, and suggests a possible approach. We suggest you read the whole document before starting to work on Stage 1 so that you have a good idea how the whole project fits together and can plan your time carefully.
A file system must be able to store its data in a persistent manner. The persistent medium in your assignment will be a disk. Your task for this stage is to design a simple disk library that reads and writes 4-kilobyte disk blocks from this disk. Your file system will be built on top of this interface.
Since we do not have a raw disk for each of you, your disk library will use a single (large) UNIX file to emulate a disk. This technique is similar in spirit to the network emulation layer you built for STCP. Your disk controller will divide the file into 4-kilobyte blocks. Blocks will be written using a ``block number.'' You will use this number to compute the offset into the UNIX file. For instance, disk block 10 will be at byte offset 10 * 4096 = 40960.
The disk library's interface (which we define) is similar to that of raw hardware provides: as a result, just as in STCP, we should be able to incorporate your file system code into a real OS with minimum fuss. OS software is frequently developed at user-level using emulation. This allows ease of use and debugging. Importantly, it also means we don't have to buy each student a disk.
The disk interface is as follows:
int openDisk(char *filename, int nbytes) int readBlock(int disk, int block_num, char *block) int writeBlock(int disk, int block_num, char *block) void syncDisk()The calls return an error if the underlying UNIX system calls fail.
openDisk
opens a virtual disk file named filename
for
reading and writing, and returns a descriptor used to access the
virtual disk. The size of the ``disk'' is fixed at nbytes
. If
filename
does not already exist, openDisk creates it.
readBlock
reads disk block number block_num
from the
virtual disk disk
into a buffer pointed to by block
.
writeBlock
writes the data in block
to disk block
block_num
in disk
.
syncDisk
forces all outstanding writes to disk.
Your task is to implement this interface. While this task should be
simple, you have only a single week to do it. The implementation will
require that you understand how to open, read, write, and seek files
under UNIX. Section 7.4 in Tanenbaum contains a short description of
the relevant UNIX system calls. The ones on Athena are mostly
identical, but verify their semantics using the man pages (for
instance, try ``man 2 open
'').
The due date of this part of the project is April 12. Since the rest of the project builds on the disk library, make sure you make the deadline! Send us a pathname to a directory in your Athena locker that contains the source code (including Makefile) and documentation. Also, make sure you have tested your disk library.
The next assignment is to implement a file system using your disk library. The file system must support the core UNIX file system calls. Your main task is to design and implement a disk layout, the main-memory and disk data structures that map filenames to disk blocks, and a recovery strategy to boot the file system using on-disk data structures. To get some idea what the issues are, read Tanenbaum's Chapter 7. This part of the project is the hard part, so do not wait until one night before the deadline!
The following interface should be supported by your file system:
int sf_create(char *name) int sf_open(char *name) int sf_close(int fd) int sf_read(int fd, char *buffer, nbytes) int sf_write(int fd, char *buffer, int nbytes) int sf_lseek(int fd, int offset, int whence) int sf_mkdir(char *name) int sf_rmdir(char *name) int sf_chdir(char *name) int sf_shutdown(void) int sf_sync(void)
These functions map closely to their UNIX namesakes. All functions
should return -1 on failure; most also return 0 on success. All file
and directory names can be either relative or absolute. A
relative pathname is given with respect to the current working
directory (a variable you must maintain). An absolute pathname is
given with respect to the root of the file system (by convention, the
root is ``/
''). For instance, ``/tmp/foo.c
'' is an
absolute pathname. As with Unix, the special directory name ``.''
names the current working directory, ``..'' names the one above.
sf_create
creates a file named name
. The operation fails
if the disk is full, if the file already exists, or if the given path
does not exist.
sf_open
opens the file named name
. sf_open
fails,
returning -1, if the file does not exist or the user already has too
many files open. On success it returns an integer that represents a
file descriptor.
sf_close
closes the file associated with the given file
descriptor. sf_close
fails if the file descriptor is invalid.
sf_read
reads nbytes
of data into buffer
using
file descriptor fd
. sf_read
fails if the file descriptor
does not exist, or if nbytes
is negative. It returns the actual
number of bytes read on success.
sf_write
reads nbytes
of data from buffer
into
the file pointed to by fd
. sf_write
fails if the file
descriptor does not exist, if nbytes
is negative, or if the
file system is full.
sf_lseek
moves the current file pointerfor file fd
by
offset
bytes. whence
indicates whether this movement is
in relation to the begining of the file (SF_SEEK_START
), the
end of the file (SF_SEEK_END
), or to the current file pointer
(SF_SEEK_CUR
). You will define these constants in the interface
header file you create for SFS. sf_lseek
fails if fd
is
not a legal file descriptor, or whence
is invalid (i.e., if it would
move the file pointer to a negative offset). Seeking past the end of
the file will extend the file; the newly allocated space will be
initialized to zeros.
sf_mkdir
creates a directory named name
. sf_mkdir
fails if the path does not exist, a file or directory named
name
already exists, or the file system is full.
sf_rmdir
deletes the directory named name
.
sf_rmdir
fails if name
does not exist or is a file
instead of a directory.
sf_chdir
change the current working directory to name
.
sf_chdir
fails if name
does not exist.
sf_shutdown
gracefully shuts down the file system.
sf_shutdown
should ensure that SFS has written all persistent
state to disk; specifically, when the file system is powered up again,
the set of files (and the file contents) should be identical to the
logical set of files on disk at shutdown time.
sf_sync
is used by applications to ensure that all data
associated with their files has been written to disk. It always
succeeds. Note that, for simple implementations of the Simple File
System, sf_sync
may not do anything at all.
These functions will be provided in a library that normal applications
can link in. You should test your file system by running as many real
applications as possible on it. To make this process feasible, you
should write a program, import
, that will copy a file from a
UNIX file system onto your file system (the Unix file name is import
's
first argument, the destination file name is import
's second
argument).
There are three main challenges in implementing this interface: (1) the main-memory and on-disk data structures to map filenames onto disk blocks, and to keep track of free blocks; (2) the disk layout itself; and (3) the recovery strategy. To keep the task manageable, you may assume that there is at most a single client using the file system at a time; that client may have multiple files open, though. We discuss each challenge in turn.
This assignment revolves around three of the functions provided by a file system:
To make the assignment concrete, we discuss a common file system organization below. You are encouraged to construct your own ideas, and to simply use this discussion to understand the issues involved.
There are three main data structures file systems manage:
Given this class of objects, the following algorithms can be used to create, grow, and delete a file.
To create a file, we:
To grow a file:
To shrink a file:
Modifying directories is similar.
Note that we are careful about when modifications to file system state are made persistent. For instance, in the case of creating a file, if we flush modifications to disk after stage (1), a crash that happened before (2) would cause us to have a disk block allocated that was not used. If we did not flush modifications to disk at all, then a crash would cause the file to be lost. You will have to pay particular attention to this and similar problems in the third part of the assignment.
While your file system does not have to survive random crashes at this
stage, it does have to maintain state persistently. In particular, it
should be able to run, be powered down with the sf_shutdown
call, and then be ``rebooted'' in the state that it was in when it was
shutdown (i.e., all files that were written at shutdown time should
still be there, and no garbage files should suddenly appear).
In addition to correctness, you should consider space efficiency in this part of the assignment. Specifically, your file system should have a low overhead per byte of file data.
When designing your system, you should think about the characteristics of the workloads it will run. A good workload to use as a model would be an Athena cluster. A useful measurement to take is the distribution of file sizes on the system. Use this measurement to derive the median file size, common outlying points, etc. Compute the overhead your file system would have under this style of usage.
We will evaluate your implementation mainly on correctness. Bonus points will be awarded for particularly space- and time-efficient implementations.
This subsection lists a number of possible ways you can extend your system. These extensions are not mandatory.
The distribution of file sizes on your system will be non-uniform. For bonus points you should exploit this non-uniformity for improved performance (a favorite trick is treating both large files and small files specially). In characterizing your performance you should consider the following:
A particularly ambitious extension would be to handle concurrently open disk in order to implement RAID techniques to ensure reliability (i.e., if one disk crashs the file system should have enough redundant state to efficiently recover).
The due date of this part of the project is April 29. Send us a pathname to a directory in your Athena locker that contains the source code (including Makefile) and documentation. Also, make sure you have tested your file system extensively.
File systems are different from most other parts of the computer system in that they are persistent: modifications to disk survive power cycles. In contrast, most other state (e.g., main memory, registers, TLB's, and, as a result, data structures) is not persistent. Many problems with making a file system robust stem from the issue of when state is moved from the non-persistent realm to the persistent realm. The state is moved in a series of discrete steps, and a failure can occur after any step; you must ensure that any possible failure leaves the system in a usable state.
For instance, your SFS will commonly write data to a file. This will probably involve a number of steps; for instance: allocating space for the file on disk, modifying the file's inode, modifying the disk block freelist, and finally, writing the data to memory. SFS must guarantee that a failure after any of these will not corrupt the system (e.g., will not lose disk blocks, delete existing disk blocks, lose inodes, etc.). This struggle involves mechanical operations (such as writing transaction software by hand) and semantic struggles (such as defining exactly what ``usable'' means). In talking about failure, more so than other domains, precise definitions are very important.
The first part of this assignment will involve defining the semantics
of failure: at what point is the modified state of a file made
persistent? For instance, if I write a single byte to disk, am I
guaranteed that when the library call returns the byte has been
written? Or do I have to perform sf_sync
by hand? If I delete a
file and the system crashes, will the file remain deleted? You should
define a non-trivial (i.e., useful) recovery model that balances the
utility of consistent state on disk with the expense of synchronous
disk updates.
Operationally, this part of the assignment will involve changing through your code so that a failure at any point in time will not corrupt the system. You may require that a program be run at boot time to reconstruct parts of SFS, but it should always be able to be recovered to a usable state.
Some failures to guard against:
We will test this part of the assignment by crashing your program at random points in time and then rerunning it (repeatedly).
The due date of this part of the project is May 10. Send us a pathname to a directory in your Athena locker that contains the source code (including Makefile) and documentation.
The due date is set on May 10--note that your design paper and your software are not due on the same date. However, plan your time carefully in this period because in this week you will also have to take the third 6.033 quiz. The Chair of the MIT faculty has assured us that this due date is in accordance with the end-of-term rules.
6.033 Lab handout 4, issued 4/4/96