Preparation for Recitation 16
Read The Design and Implementation of a Log-Structured File
System (reading #13 in the course packet).
As you probably remember from DP1, the file system is the part of an
operating system that manages, among other things, how files are laid out
on disk. In 6.033, you have so far seen several examples of file systems:
the UNIX File System (presented in the UNIX paper and in Appendix 2-A) and
your own high-performance-but-low-reliability file system, designed for
DP1. Some of the tutorials also covered the Berkeley Fast File System
(FFS).
In today's reading, the authors depart radically from FFS and propose,
instead, a file system in
which all of the data is stored in an append-only log, that is, a
flat file that can be modified only by having data added to the end of it.
In Chapter 9, we also hear about logs, specifically how they
help achieve reliability. For today's reading, the purpose of
the log is to achieve good performance.
To understand why an append-only log might lead to high performance, we quickly
review the basics of disks (which you should recall from DP1).
Data on a disk is stored on a constantly spinning
platter and read by a head that floats above the platter. The
platter is broken down into concentric rings called tracks; each
track is divided into sectors. The time required to read or write
data to a disk can be broken down into three components: moving
the head to the correct track, waiting for the correct sector to
rotate under the head, and transferring the data off the disk. The
first two operations (known collectively as a "seek") are very
expensive (and aren't getting cheaper very fast): together, they can
take 10 or more milliseconds. Once the disk head is at the correct
location, it can transfer data relatively quickly (at around 40 MB/s
on today's high-end disks or 10 MB/s on the lower-quality disks DP1
assumed).
Now, given this description, the goal of a log-structured file system
is to minimize seeks by treating the
disk as an infinite append-only log. For example, the file system
software simply appends new files to the end of the log. For this strategy
to translate into higher performance, the file system software must:
- Write in large batches to amortize the cost of a seek (if any).
- Always start writing from the last place it stopped writing.
- Avoid reading; reads are likely to cause a seek.
As you read the paper, consider how the authors achieve these three
goals. Keep in mind that the "infinite log" is actually on a finite
disk and ask how that constraint affects their goals.
Also, ask yourself whether
certain file access patterns by applications might make it hard for a
log-structured file system to avoid seeks.
|