Notes for the talk "Fault Tolerance in Very Large Archival Systems" by J. H. Saltzer Fourth ACM SIGOPS European Workshop (EW-4) Bologna, Italy, September 3-5, 1990 Transcribed from handwritten notes ------------------------------------------------------------------------- The other papers in this session give reports of work in progress. This is a speculative talk--intended to stimulate discussion & thinking, not a research report. Everybody knows that technology improvements can open the way to new applications. One technology improvement has been going on almost unrecognized, and it is about to break open a whole new application area. The technology is magnetic disk storage. The application is long-term archive of books and papers in image form--the on-line library. [slide on technology] Let me start by trying to convince you that there has been an astonishing revolution in technology. [slide on cost/book] 1999: PC has a 1.2 Gbyte disk, room for 2-3000 books. At $1/volume, the cost of storage for a university library is within reach (especially if each library stores only unique materials) This development leads to lots of interesting system issues - distribution of responsibility (client/server architecture) - how to find things (search) - fault tolerance [relate to workshop theme: Fault tolerance support in Distributed Systems] The special system problem: data lifetime far exceeds medium lifetime [slide on lifetimes] (Complete fabrication) [Librarians are VERY wary at this point] Need: Technology refresh plan - part of the system design, like the backup system - runs on cycle of years (say, five) - may take a year to do a refresh - unlikely to be fault-free - easy to lose track (long-running transaction) - meanwhile, new material is being added (Fascinating design problem) Reliability goal: when someone looks at the image of a book for the first time, 40 years from now, the data will be there (8 refresh cycles later) ------------------------------------------------------------------------- 1. Basic approach: multiple copies U.S. Lib of Congress 1. Washington, D.C. } Geographical distr. for independence 2. San Francisco } of gross faults, 3. Novosibirsk } earthquakes, riots 2. Medium failure (bit rot) Error detection (some forward error correction) As typically practiced today inside a good disk controller, or perhaps a little more. + communication to the other sites for backup copy. 3. Refresh Mistakes: - Do refresh one copy at a time - Do a verify (read data, calculate a fingerprint, send to site 2 with old version, read it there) [need to read 100% of data at 2 sites, but need to transmit only 10 bytes/Mbyte.] - Then can start refresh at the next site [Slide 4] When we get to step 6 it is time to start over ------------------------------------------------------------------------- Here are a few obvious OPEN QUESTIONS: you can surely extend this list - Relation of compression to encoding of error detection/correction in file system? - Deletion: is it REALLY append only? - Reading during refresh - Proper # of copies - will anyone be interested? - Backup? - Changing history?