Lab 3 - A Simple File System

Due date of preliminary design: April 11
Due date of Stage 1 (disk library): April 11
Due date of Stage 2 (intermediate system): April 25
Due date of Stage 3 (final system): May 6
Due date of design paper: May 8

Introduction

The goal of this project is to obtain experience with file systems. A file system is a core component of most operating systems and the implementation of this system will provide experience designing complex systems, one of the major topics in 6.033. File systems, by their nature, incorporate aspects of naming, fault-tolerance, concurrency control, and storage management. In addition, issues of security and networking appear in more ambitious designs.

To help you in designing your file system, this project is organized into a couple stages. In stage one, you are asked to build a disk emulation library that will emulate a disk on top of a single file. Its interface includes functions to read, write, and flush disk blocks. In stage two, you will build a simple file system on top of the disk interface you built in stage one. This file system should provide efficient storage. In particular, access to file blocks should be efficient and the amount of metadata needed to implement the file system should be small. In stage three, you are asked to make your file system robust in the face of failure (i.e., it should not corrupt data if it crashes). To test robustness, we will crash your file system at arbitrary points in its code: your file system should be able to recover to an internally consistent state irrespective of where these crashes occur.

The final result of your project is a complete document detailing the design and implementation of your simple file system (SFS), the code that implements the design, and a short evaluation of your file system in terms of performance and functionality. This document can be used as the paper for the second design project in 6.033. The paper is due on May 8.

Unlike the RPC project, you are strongly encouraged to execute this project in teams of three. Find someone else to work with and send mail to Constantine Sapuntzakis (csapuntz@lcs.mit.edu), telling him who your teammate is. Make sure to do this right away. The first part of the project is due very quickly (April 11).

The rest of this handout touches on some of the design and implementation issues, and provides some suggestions on a possible approach. We suggest you read the whole document before starting to work on Stage 1 so that you have a good idea how the whole project fits together and can plan your time carefully.

Stage 1: Disk library

A file system must be able to store its data in a persistent manner. The persistent medium in your assignment will be disk. Your task is to design a simple disk library that reads and writes 4-Kbyte disk blocks. Your file system will be built on top of this interface.

Since we do not have a raw disk for each of you, your disk library will use a single (large) Unix file to emulate a disk. Your disk library will divide the file into 4-Kbyte blocks. Blocks will be written using a ``block number.'' You will use this number to compute the offset into the Unix file. For instance, disk block 10 will be at byte offset 10 * 4096 = 40960.

The disk library's interface (which we define) is similar to that of raw hardware provides: as a result we should be able to incorporate your file system code into a real OS with a minimal of fuss. OS software is frequently developed at user-level using emulation. It allows ease of debugging and use. Importantly, it also removes the need for us to buy each student a disk.

The disk interface is as follows:

int openDisk(char *filename, int nbytes);
int readBlock(int disk, int blocknr, void *block);
int writeBlock(int disk, int blocknr, void *block);
void syncDisk();

The calls return an error if the underlying Unix system calls fail.

openDisk opens a file ( filename) for reading and writing and returns a descriptor used to access the disk. The file size is fixed at nbytes. If filename does not already exist, openDisk creates it.

readBlock reads disk block blocknr from the disk pointed to by disk into a buffer pointed to by block.

writeBlock writes the data in block to disk block blocknr pointed to by disk.

syncDisk forces all outstanding writes to disk.

Your task is to implement this interface. While this task should be simple, you have only a single week to do it. The implementation will require that you understand how to open, read, write, and seek files under UNIX. Section 7.4 in Tanenbaum contains a short description of the UNIX system calls. The ones on Athena are mostly identical but verify their semantics in the man pages (``man 2 open'').

What to hand in?

The due date of this part of the project is April 11. Since the rest of the project builds forth on the disk library, make sure you make the deadline. Send us a pathname to a directory in your Athena locker that contains the source code (including Makefile) and documentation. Also, make sure you have tested your disk library.

Stage 2: File system

The next assignment is to implement a file system using your disk library.

There are three main challenges in implementing this local filesystem: (1) the main-memory and on-disk data structures to map files names onto disk blocks and to keep track of free blocks; (2) the disk layout; and (3) the recovery strategy. To keep the task manageable, you may assume that there is a single client using the file system; the single client may have multiple files open, though. We discuss each challenge in turn.

This assignment revolves around three of the functions provided by a file system:

Naming. Human-readable file names must be mapped to the disk blocks that they are associated with.
Organization. File systems (typically) have mechanisms humans can use to organize information. Computer scientists like hierarchies, as a result, the tree directory structure has become the dominant mechanism for organization.
Persistence. Data is entrusted to the file system and is expected to remain there (uncorrupted and undeleted) until an explicit request is made to delete it. This aspect includes support for allocation and deletion of files; as syntactic sugar, file systems also provide mechanisms to modify existing files.

Important issues that we will treat lightly include protection (ensuring only authorized programs can read/write/delete/create files) and performance (disks are slow: most research on file systems concentrates solely on performance).

To make the assignment concrete, we discuss a common file system organization below: you are encouraged to construct your own ideas, and to simply use this discussion to understand the issues involved.

There are three main data structures file systems manage:

A disk block freelist. This is a simple data structure (for instance, a bitmap) that tracks what blocks on the disk have been allocated. This data structure is, for obvious reasons, persistent (i.e., on disk). You will have to establish some convention for locating it across reboots.
Inodes. An inode is persistent memory that contains pointers to disk blocks or to ``indirect blocks'' (you can think of these as other inodes). Inodes are used to map a file to the disk representation of a file: each file has its own inode, the inode has pointers to all disk blocks that make up that file.
Directories. Directories are persistent mappings of names to inode numbers for all the files and directories that the directory is responsible for.

Given this class of objects, the following algorithms are used to create, grow, and delete a file.

The actions to create file:

Allocate and initialize an inode.
Write the mapping between the inode and file name in the file's directory (usually the current working directory).
Write this information to disk.

To grow a file:

Load the file's associated inode into memory.
Allocate disk blocks (mark them as allocated in your freelist).
Modify the file's inode to point to these blocks.
Write the data the user gives to these blocks.
Flush all modifications to disk.

To shrink a file:

Load the file's associated inode into memory.
Remove pointers to the disk blocks from the inode.
Mark the disk blocks as free.
Flush all modifications to disk.

Modifying directories is similar.

Note that we are careful about when modifications to file system state are made persistent. For instance, in the case of creating a file, if we flush modifications to disk after stage (1), a crash that happened before (2) would cause us to have a disk block allocated that was not used. If we did not flush modifications to disk at all, then a crash would cause the file to be lost. You will have to pay particular attention to this and similar problems in the third part of the assignment.

While your file system does not have to survive random crashes at this stage, it does have to maintain state persistently. In particular, it should be able to run, be powered down, and then be ``rebooted'' in the state that it was in when it was shutdown (i.e., all files that were written at shutdown time should still be there, and no garbage files should suddenly appear).

In addition to correctness, you should consider space efficiency in this part of the assignment: your file system should have a low overhead per byte of file data.

We evaluate your implementation on how efficient it is, the storage overhead on disk, and how well it recovers from failures. You do not have to implement block caching. If you implement caching correctly, however, we will give you bonus points.

Interface and Implementation

How will you implement this file system? In many modern operating systems, the file system is implemented in the kernel, which is difficult to modify and debug (especially when you don't have the source code!). Another option would be to have you implement a library of filesystem functions (similar idea to the RPC assignment) and then implement some test applications against that library. The advantage of this approach is that is significantly easier to debug. The disadvantage is that you cannot run unmodified applications against it. A third option, the one we chose, is to have you implement your file system as an NFS server on your local machine. You then mount that NFS server and the file system becomes part of your file system name space.

This approach requires you to understand some of intricacies of NFS. You will find RFC 1094 (http://web.mit.edu/6.033/lab/handouts/rfc1094.txt) useful.

The source code for the NFS server skeleton (without a complete filesystem) will be posted in /mit/6.033/lab/lab2/src. It should run under NetBSD, Linux, and Solaris.

A follow-on handout will talk about the NFS server issues in more detail.

What to hand in?

The due date of this part of the project is April 25. Send us a pathname to a directory in your Athena locker that contains the source code (including Makefile) and documentation. Also, make sure you have tested your file system extensively.

Stage 3: Robustness

File systems are different from most other parts of the computer system in that they are persistent: modifications to disk survive power cycles. In contrast, most other state (e.g., main memory, registers, TLB's, and, as a result, data structures) is not persistent. Much reasoning about robustness in a file system stem from the problem of moving state from the non-persistent realm to the persistent realm in a series of discrete steps in such a way that a failure can occur at any point in time and still leave the system in a usable state. For instance, SFS will commonly write data to a file. This will involve a number of steps: allocating space for the file on disk, modifying the file's inode, modifying the disk block freelist, and finally, writing the data to memory. SFS must guarantee that a failure after any of these will not corrupt the system (e.g., will not lose disk blocks, delete existing disk blocks, lose inodes, etc.). This struggle involves mechanical operations (such as writing transaction software by hand) and semantic struggles (such as defining exactly what ``usable'' means). In talking about failure, more so than other domains, precise definitions are not semantic quibbles.

The first part of this assignment will involve defining the semantics of failure: at what point is the modified state of a file made persistent? For instance, if I write a single byte to disk, am I guaranteed that when the library call returns the byte has been written? Or do I have to perform a sync by hand? If I delete a file and the system crashes, will the file remain deleted? You should define a non-trivial (i.e., useful) recovery model that balances the utility of consistent state on disk with the expense of synchronous disk updates.

Operationally, this part of the assignment will involve working through your code so that a failure at any point in time will not corrupt the system. You may require that a program be run at boot time to reconstruct parts of SFS, but it should always be able to be recovered to a usable state.

Failures to guard against:

Use of allocated disk blocks. Blocks may have been allocated to a particular file but the freelist was only modified in memory, not on disk.
Loss of disk blocks. Blocks may have been removed from the freelist but not yet had an inode persistently modified to point to them.
Loss of data. Data that has been written to disk should stay there.

There are, of course, many other failures possible. This list should give their flavor.

We will test this part of the assignment by crashing your program at random points in time and then rerunning it (repeatedly).

What to hand in?

The due date of this part of the project is May 6. Send us a pathname to a directory in your Athena locker that contains the source code (including Makefile) and documentation. In this part, we will be crashing your file system and check whether it recovers correctly.

The due date is set on May 6 so that your design paper and your software are not due on the same date.

References

Constantine Sapuntzakis Thu Apr 10 23:34:01 EDT 1997