6.033 - Computer System Engineering | Recitation 19 - Thursday, April 14, 2005 |
Read How to Build a File Synchronizer by Trevor Jim, Benjamin Pierce and Jerome Vouillon. (This unpublished paper is not in the handouts.) You can find out more information about Unison at the The Unison Home Page. A User Manual and Reference Guide is also available. (Please read Section 10.D of the course notes before reading the Unison paper --- Section 10.D shows the essence of the ideas, while the Unison paper shows many of the real-world problems that crop up when attempting to implement them!)
Unison is designed to solve the problem of reconciling two file repositories when both are subject to asynchronous changes. The industry uses two terms to describe this process --- "synchronization" and "reconciliation." In 6.033 we will try to use the term "reconciliation" for this process and reserve the term "synchronization" for processes that involve setting two or more clocks to the same time.
A common use of Unison is to maintain consistency between a set of files in your Athena home directory and a laptop. Each time you run Unison, the program will:
As the authors note, it is rather easy straightforward to build a simple file reconciler; it is much harder to build one that is efficient over slow links, that can work across operating systems, and that is tolerant of failures in either the systems being reconciled or the network.
Reconciling files between multiple computers is a longstanding problem among computer scientists born from the practical problems of maintaining a consistent environment across multiple machines. Early versions of Berkeley 4.2 Unix (circa 1984) came with a one-way file reconciler called rdist. This program inspired Tridgell [PhD Thesis] to write rsync, a program that performed much the same function, but improved performance with the "rsync algorithm" --- a technique for incrementally updating large files such as logfiles that tend to be extended rather than rewritten. But both rdist and rsync are more properly thought of as efficient publication systems rather than reconciling systems: they will propagate changes from a central computer to other nodes across a network, but they are not very good at bi-directional reconciliation. Both of these programs are further limited in that they only operate under Unix, while Unison runs under Windows and MacOS as well.
Russ Cox and William Josephson created a multi-host file synchronizer called tra which is pretty neat, but it only runs on Unix.As the authors of the paper make clear, much of their effort has been spent on making Unison work in a cross-platform manner. For example, Unix file modification times have a resolution of 1 second, while Windows file modification times have a resolution of 2 seconds. The solution that Unison takes is to mask the bottom bit of the Windows file modification times. One of the difficulties in writing Unison is that the authors needed to discover these inter-operating system differences by trial and error: there is no single document that lists them all.
Unison is unquestionably a technical success: the program is included in multiple Unix distributions and members of the 6.033 staff have been using it since 1998. Nevertheless, Unison does have its drawbacks, some of which can be inferred from the paper:
The authors of the paper argue that they have taken several steps to improve performance, but in their table at the top of page 9 (Figure 2) they note that their program is slower in every case than Rsync. Why do you think this is so? What else is wrong with the information provided in the table? Despite being compared with rsync, Unison cannot interoperate with it. Do you think that this was a good design decision?
After reading the paper, would you trust Unison to reconcile your files? Why or why not?