Effect of Incommensurate Scaling on File System Design

Charles Coffing
Ward/MacGregor

Over the past ten years, disk transfer rates have increased considerably. An RLL drive of 1987 had a transfer rate of either 350 or 700 kilobytes/second (depending upon the interleave), while a state of the art drive today may transfer over 10 megabytes/second. Depending upon the exact figures cited, disk bandwidth has easily increased twenty or thirty times what it was ten years ago. Access times, however, have not experienced such dramatic improvement. Many drives today may tout an average seek time of 10 ms or so; this is only several times better than the average 10 years ago of 28 ms. Hard drives are victims of incommensurate scaling, with the cost of seeking growing disproportionately large as time goes on.

To achieve maximum performance from a drive, then, the high disk bandwidth must be exploited while minimizing seeks. Previous file systems have attempted this with limited success if at all. The generic Unix file system does not attempt to minimize seeks, leaving inodes near the beginning of the disk and saving the data almost anywhere. Files in a given directory are not necessarily grouped together on disk. The Berkeley FFS improves on this slightly by grouping the files of a directory and the related inodes together in cylinder groups, which lessens the need for long seeks.

Access time is affected by factors besides just the seek distance, such as rotational latency and command processing delays. In other words, all seeks whether long or short are very costly. Based on this observation, CFFS pushes the optimization employed by FFS even further, avoiding seeks entirely in many cases. One method of achieving this goal is to imbed the inodes of all files in a directory within the directory itself. This greatly speeds commands which access the inodes of many files within a directory. On a traditional Unix file system, the ls command will cause a seek to read the directory, then a seek to the inode of each file. That is, the number of seeks grows linearly with the number of files. CFFS, by contrast, performs a single seek to the directory. The directory and all inodes are read in a single pass.

CFFS also takes advantage of the high bandwidth of modern disks by placing the small files of a directory adjacently on disk. The incremental cost of reading a few extra files is negligible, since the head is already positione d over the track and the disk has a high bandwidth. Coupled with an intelligent cache, the extra files read in are available very quickly should the program need them. The traditional Unix file system, on the other hand, makes no effort to group small files together. Access to each file will likely be proceeded by a costly seek.

The disparity between seek times and disk bandwidth is large and is likely to continue to increase in the near future. This growing gap demonstrate s a need to continually reevaluate the performance of file systems. CFFS has managed to effectively addresses the current disparity by exploiting what the disk does well and avoiding what it does poorly.