Notes for the talk "Fault Tolerance in Very Large Archival Systems"
by J. H. Saltzer

Fourth ACM SIGOPS European Workshop (EW-4)
Bologna, Italy, September 3-5, 1990
Transcribed from handwritten notes

-------------------------------------------------------------------------

The other papers in this session give reports of work in progress.
This is a speculative talk--intended to stimulate discussion & thinking, not a research report.

Everybody knows that technology improvements can open the way to new applications.

One technology improvement has been going on almost unrecognized, and it is about to break open a whole new application area.

The technology is magnetic disk storage.

The application is long-term archive of books and papers in image form--the on-line library.

[slide on technology]  Let me start by trying to convince you that there has been an astonishing revolution in technology.

[slide on cost/book]  1999:  PC has a 1.2 Gbyte disk, room for 2-3000 books.

At $1/volume, the cost of storage for a university library is within reach (especially if each library stores only unique materials)

This development leads to lots of interesting system issues
       - distribution of responsibility (client/server architecture)
       - how to find things (search)
       - fault tolerance 
           [relate to workshop theme:  Fault tolerance support in Distributed Systems]
       
       The special system problem:  data lifetime far exceeds medium lifetime
       
[slide on lifetimes]  (Complete fabrication)

[Librarians are VERY wary at this point]

Need:  Technology refresh plan
    - part of the system design, like the backup system
    - runs on cycle of years (say, five)
    - may take a year to do a refresh
         -  unlikely to be fault-free
         -  easy to lose track (long-running transaction)
         -  meanwhile, new material is being added
         
    (Fascinating design problem)
    
    Reliability goal:  when someone looks at the image of a book for the first time, 40 years from now, the data will be there (8 refresh cycles later)
    
-------------------------------------------------------------------------

1.  Basic approach:  multiple copies

    U.S. Lib of Congress
           1.  Washington, D.C.   }      Geographical distr. for independence
           2.  San Francisco      }      of gross faults, 
           3.  Novosibirsk        }      earthquakes, riots
           
2. Medium failure (bit rot)
    Error detection (some forward error correction)
        As typically practiced today inside a good disk controller, or perhaps a little more.
        + communication to the other sites for backup copy.
        
3.  Refresh Mistakes:
       -  Do refresh one copy at a time
       -  Do a verify (read data, calculate a fingerprint, 
                       send to site 2 with old version, read it there)
          [need to read 100% of data at 2 sites, but need to transmit only 10 bytes/Mbyte.]
       -  Then can start refresh at the next site
       
[Slide 4]  When we get to step 6 it is time to start over

-------------------------------------------------------------------------

Here are a few obvious

OPEN QUESTIONS:  you can surely extend this list

  -  Relation of compression to encoding of error detection/correction
                        in file system?
                        
  -  Deletion:  is it REALLY append only?
  
  -  Reading during refresh
  
  -  Proper # of copies
  
  -  will anyone be interested?
  
  -  Backup?
  
  -  Changing history?