M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering Handout 33 - April, 1999
Latest update - April 28, 1999, 11:00 p.m. (hit RELOAD for latest version)
(updates include revisions to existing answers)
[note new answer re: MD5/SHA1 speeds (updated 04-26 7:35pm)]

Design Project #2: Questions and Answers

This handout records questions and answers that are likely to be of wide interest to teams working on design project 2. Check back occasionally, because if it will be updated daily as long as interesting questions come in.

Should we be concerned about possible creation of phantom files, such that it happens in all the replica sites and cannot be detected even by majority polling?

How is the metadata of a file stored? Is it in the directory entry, in the same region as the file, or somewhere else?

Where is the boundary of the "repository"? The design description talks about operations on the repository, but there are also replicas and file systems. I'm not clear on how these things fit together.

What is the budget? We could store everything in RAM to speed things up, and install several dedicated optical fiber links across the Indian Ocean and tunneling the Himalayas to connect Capetown with Novosibirsk. We know that it is a government project, but does that really mean that money is no object?

In the section that describes the file system methods it says:

status = create-file(directory, metadata)

What is the behavior of this function when you attempt to create a filename that already exists in the directory?

What is the expected cost of the 800GB drives the library will be purchasing in 2005? (in 2005 US$)

I was looking up Seagate's Cheetah 36 (ST136403FC0 hard disk specs, which advertises 5.7 msec average seek time (read) and 6.5 msec (write). Is this the value from the point where the read / write request is made to the time the disk starts getting read from / written to, or is there a missing factor here?

The speed of data transfer from disks is around 10 - 12.5 MByte/sec now. Do we assume this stays about the same through 2005?

Is the 125-disk figure for a single replica a limit, or can we assume there is room for more?

Where is the library going to store all these bits until 2005?

What kind of network do we have to work with? Can we expect to have 1MB/sec connections for each client? Or is it 100MB/sec? Or is it only 56k/sec?

With hard drives, what is the likelihood of different errors? How do temporary read failures, sector damage leading to read failures, and whole-disk failures compare in likelihood? What other errors are common with respect to hard drives?

Is it safe to define a UID/URN as the MD5 digest of a file? According to the MD5 docs, "it would take approximately 2^50 attempts on average to find 2 inputs producing the same digest." But we only have approximately 2^28 objects to store. This seems to say that the probability of UID/URN collisions based on MD5 digests would be "low."

But design-wise, is this a good design principle to follow? Is this tradeoff justified?

Can any replica site add items to the repository, or do all additions come from one site only?

Can we assume that the original digitized source that is entered by the human librarians is correct?

What is meant by 'bogus data' in the description of dp2? Is that data not entered by human librarians, or data that has errors in it, or something else entirely?

Say we want to create a new replica site in London because the replica on the island of Diago Garcia was swept out to sea in a tidal wave. The design project statement says "any procedure that attempts to copy 105,000,000 files is likely to be interrupted a few times before it completes." That seems to imply that we are expected to transfer all the replica data over the Internet to the new replica site. Are we allowed to consider copying the data onto a set of new disks in Washington, and sending them by FedEx to London?

Is it okay to change the spec for the repository-level put? Currently it takes only one argument, but we would like to pass along some metadata to be stored with the file, and that would require adding another argument.

Why does remove-file require a data argument? Is that a safety measure or what?

We want to keep a single bit flag on disk in order to indicate whether or not an event has occurred. We would like to read and write this bit directly without going through the file system. Can we assume that there is an interface to the disk called "write" that will bypass the file system and write a bit (or maybe a byte) directly onto disk? If so, is this atomic? (In lecture, Frans assumed that he could atomically write one bit to disk.)

In that case can we assume that writing a single bit to disk with write-file is atomic?

For purposes of assessing network traffic demands, can we assume that repository-put requests will come predominantly from the Washington, DC area?

We were thinking of using {choose one: Library of Congress call number, Dewey Decimal System call number, Library of Congress cataloging number, International Standard Book Number, the book title} as object identifiers, but we would have to extract them from deposited objects--if they are in there--and that seems like an abstraction violation.

If the replica sites run the same software, do the same mechanisms/protocols always necessarily produce the same errors (if there are any)? That is, given the same circumstances, do the bugs of the same mechanisms in different sites always generate the same results?

Should we compile statistics for the failure modes of hardware (disk, processor, etc.), software, and network, or do we just focus on the fault analysis of our protocols? If we need to worry about the former, how can we provide a long-term analysis if the specifications for the components constantly change over time?

Since Prof. Rivest recommended using SHA1 over MD5 as a hash function, our group is wondering how fast SHA1 runs and what the prospects are for speedups over the next five years. And is anyone likely to implement SHA1 in hardware?

Wow! Those must have been really hot implementations of MD5 and SHA that were reported in Schneier's book (see Q & A above). A 33 MHz 485 doing one 4-byte add from the cache per machine cycle would only be able to add numbers up at 132 Mbytes/second. How were they able to get MD5 to run faster than addition?

We are still puzzled over the suggestion that we do an error event analysis. What is its scope? Is it just on the subsystems we design? Does it include the hardware? The whole system? What is its focus?

What's a reasonable limit on how long a new replica creation should take? Six months seems too long, and a week is probably fine. What about 2 weeks? A month? It seems difficult to compute an MTTF for a massive site failure caused by things like intentional destruction.

We were wondering if the information coming across the repository put interface is compressed. If it is not, we are thinking of compressing it to reduce disk read time and data transfer time.

When deciding which disk to use for adding new data, We'd like to have some idea of space in use/available on each disk. There's no method provided for this, short of trying to create files on each disk, and checking status. is it reasonable to request ("insist on") an fs method that returns disk usage info for mounted disks?

It says in the Q & A that we can assume a disk rate in 2005 of 160 MBytes/second. A back-of-the-envelope calculation suggests that at that rate it would be possible to read and check every file in a replica in about a week. My intuition says that is optimistic. What is missing from the back of my envelope?

Even if we assume that non-recoverable read errors are ten times as high as were quoted for the Samsung disk, they are so rare that the chance of exactly the same sector being lost in two replicas is vanishingly small. So what is the big deal about hurrying to fix missing sectors?

Should we optimize our system to allow users to use the closest site to them for reads?

The file system spec says that when bad sectors are discovered they are marked so that they will never be used again. Yet it also says that write-file may store an incomplete file. This seems inconsistent.

As we write this thing up we are having a hard time squeezing it into the suggested 8-10 pages. Are we going to lose big if we go over that limit?


Go to 6.033 Home Page Questions or Comments: 6.033-tas@mit.edu