M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering Handout 19 - April 18, 2001

Assignment 6: April 23rd through April 27th

For Lecture, Monday, April 23rd (Fault Tolerant Computing)

In preparation for this lecture on authentication, please read chapter 7.

For Recitation, Tuesday, April 24th (RAID)

Read "Disk System Architectures for High Performance Computing," by Katz et al. (reading 21) and answer the following question:

Modern RAID arrays use parity information and standby disks to provide a highly reliable storage medium even in the face of hardware failures. A highly available system requires more than a very reliable storage system, however. Consider a networked server handling network transactions (a web server or bank central computer perhaps). Think about other components of this system whose failure could result in a loss of service. Pick out a few of these and explain how they might be made more reliable in the same way that RAID made disks more reliable. For example, multiple power supplies can be arranged in parallel to power a machine even if one fails. For your examples consider at least one hardware component and one software component of the system.

For Lecture, Wednesday, April 25 (Transactions)

In preparation for this lecture, read sections A and B of chapter 8.

For Recitation, Thursday, April 26 (LFS)

For recitation, read "The Design and Implementation of a Log-Structured File System" by Rosenblum and Ousterhout (reading 22).

LFS is claimed to exploit the bandwidth capacity of fixed disks for small writes better than traditional filesystems. In this hands-on assignment we will experiment with some of the same benchmarks used in the LFS paper.

This assignment should be done on an otherwise unused workstation that you have root access to. Athena workstations are acceptable (the root password can be found by typing tellme root at the athena prompt). To become root type su at the athena prompt and give the root password when prompted.

Step 0: Preparation You will need to collect some information about the machine you are using before beginning the assignment: determine the name of the raw disk device for the machine's hard drive. On linux this will be something like /dev/sda or /dev/hda. On Solaris (athena) machines it will be something like /dev/rdsk/c0t0d0s3. You can find this info easily by looking at the output of the df command which lists the available space on each partition. Note that df returns available space in terms of the number of (usually 1K) blocks free. Either pick a partition that corresponds to the whole disk (/dev/rdsk/...s3 on Solaris or /dev/[hs]da on linux) or one that is at least 64MB in size. Also determine the model number and vendor of the disk. You may find this information by scanning the output of the command dmesg on linux machines or sysinfo on the Suns.

You will need to build some test software to complete this assignment. This software is found here. Untar the source and build it with the following commands:

athena% add -f gnu
athena% gtar zxvf lfs.tgz
athena% cd lfs-0.0
athena% ./configure
athena% gmake

Step 1: Raw Data Transfer Rate/Latency To begin this hands-on assignment we determine the maximum data rate and average latency of the fixed disk attached to the machine we are using. We will measure disk performance by accessing the disk in "raw" format. This avoids the overhead imposed by acessing the disk through the file system. Later we will compare these results to those obtained by accessing the disk via the file system.

1.A Data Rate There are two ways to find the maximum data transfer rate of your disk:

1) Using the supplied benchmarking program: bw. This program expects two arguments: the name of the raw disk device (collected above), and the number of megabytes to read from disk. Specify the -b option to measure the data transfer rate. The command must be run as root on most systems. Results are most reliable if at least 64 MB are read. Experiment with different sizes to find the true transfer rate.

Example usage: athena# ./bw -b /dev/rdsk/c0t0d0s3 64

2) Using the dd and time commands. Type man dd for more details

Example usage: time dd if=/dev/rdsk/c0t0d0s3 of=/dev/null bs=1024k count=64

1.B Latency Find the latency using the bw command (with the -l option):

In this mode, bw expects two arguments: the name of the device and the number of trials to run. For each trial, bw will seek to a random location on disk and read a small amount of data. It will report the average latency to begin a read (this latency includes both the time to position the disk heads and the rotational latency). Increasing the number of trials will produce better results. Experiment until you are confident in your answer. Make sure that the specified partition is at least 100MB in size If the command returns an "invalid argument" error (especially on older Sun machines) try substituting "dsk" for "rdsk" in the device name.

Example: athena#./bw -l /dev/rdsk/c0t0d0s3 1000

Report your results for the latency and bandwidth of your disk. To double-check your answers, look up the specifications for your drive on the vendor's website. Vendors may not report exactly the statistics you are looking for. Just use their numbers as a sanity check. Report your results.

Step 2: LFS Benchmarks We will now run a version of the benchmarks from the LFS paper. The results of these benchmarks (as run by the authors) can be seen in the graphs on pages 45 and 46 of the LFS paper. Please review section 5.1 of the paper which describes these tests in detail.

2.A First, run the small file benchmark using smallfb. This program expects two arguments. The number of files to create (10000 in the paper) and the size of each file (1024 bytes in the paper). Run smallfb with these values and report your results. This command may take a long time (minutes) to complete. You may experiment with lower values, but if you set the number of files too low you may end up testing your disk cache and not your disk. Note: make sure that you run this command from a directory which resides on your local disk (smallfb will test the current working directory). "/var/tmp" is a good choice on most Athena machines. Do not run this command from your home directory or another network file system. Also, make sure that the partition you run this command on has enough free disk space to complete the command (again, check with "df -k")

Example usage: athena%./smallfb 10000 1024

2.B Now run the large file benchmarking program: largefb. Largefb takes one argument: the size of the output file in Megabytes. Run largefb and report your results.

Example usage: athena% ./largefb 64

Step 3: Analysis Given what you know about the (non-log-structured) file system running on your test machine, explain the performance of your file system on these benchmarks. Questions to consider:

- What is the ratio of the actual bandwidth achieved in the small file benchmark to the raw bandwidth of the disk? In other words, are you able to take advantage of the raw performance of your disk? What accounts for the discrepancy?

- What is the limiting factor in each of the two tests (large/small file benchmarks)

- Would you expect the log file system perform better in these tests? Why?

- Optional: Are there other aspects of the file system that can affect the performance of these tests? Those who tested on the Linux ext2 file system should consider the fact that ext2 relaxes the consistency guarantees of FFS by not synchronously writing meta-data (inodes, superblock, freelist etc.) to disk.

There may not be obvious answers to these questions. Disk caches, file system policies, and system overhead can confuse the numbers. Do your best to explain the numbers you see.

System aphorism of the week
An engineer is a person who can do for a dime what any fool can do for a dollar. (Anonymous)