6.033 Discussion Suggestions (Version histories in practice)

6.033--Computer System Engineering

Suggestions for classroom discussion

Topic: Does anyone actually use version histories?

by J. H. Saltzer, April 28, 2004.

After reading the sections in the notes about version histories, a common question is "where, if at all, are version histories used in the real world?" The answer is that they are almost unheard of in high-powered transaction systems, because they would (presumably) interfere with performance, although there is a small community working on what they call "temporal" or "rollback" database systems.

On the other hand, when you move to applications where human engineering is more important than performance you begin to find them. They are used primarily to provide recoverability. Here are some examples.

Source code management systems
- SCCS
- RCS (and therefore CVS)
- SCM
- ClearCase
- GSS
Any user interface system that provides unlimited or deep undo...
- Emacs undo (control-U)
- Adobe Photoshop, starting with Release 6.0
- Adobe InDesign
- Swing, for Java
- Intellicad
- (hundreds of others)
File systems
- TENEX, TOPS-20, RSX, VMS (DEC really got into versioned files)
- Cedar (Xerox PARC)
- Plan 9 from Bell Labs (backup system only)
- Elephant
- ISO 9660
- (innumerable toy systems)
Database systems
- Postgres (sort of)
- Any DBS with temporal operators
- (name forgotten; one of the small IBM database systems in the mid 1980's)
Misc
- Emacs file management (when versioning is turned on)
- Delta-tek forensic versioning system

A related question is whether or not anyone uses version histories to provide isolation, rather than locks.

That question probably reduces the above list to zero length. One reason is that, to maximize performance, nearly all DBMS systems are constructed in a single layer with little modular decomposition. The place where version histories for isolation provide added value is when your system will benefit from modular decomposition with several layers, and more than one layer needs to set locks. (This is the same problem that led to the golden transaction undo's in System R.)

The topic is known as "nested transactions," and Gray and Reuter caution that the area is not well understood. What that really means is that if you try to use locks for isolation in nested transactions, things get really complicated, really fast. In the real world, there hasn't been much demand for nested transactions, so people haven't seriously considered alternatives to locking.

On the other hand, most techniques to allow out-of-order execution in high performance processors (for example, register renaming) provide isolation by using something essentially the same as a version history, with the additional isolation constraint that the resulting serialization must be equivalent to the one the programmer specified.

David Reed developed the use of version histories for isolation in order to allow transactions to be distributed across multiple network sites. As it has turned out, there hasn't been much demand for that application. But in doing so, he also revealed the insight that one way to think about isolation is to ensure that all of the variables that are input to an isolated action have the values that they will have when every preceding transaction commits or aborts. That insight is useful pedagogically, and that usefulness (plus their applicability to processor design) explains why version histories appear in connection with isolation in 6.033.

Comments and suggestions: Saltzer@mit.edu