6.033 Discussion Suggestions (Coda disconnect paper)

6.033--Computer System Engineering

Suggestions for classroom discussion

Topic: James J. Kistler and M(ahadev) Satyanarayanan. Disconnected operation in the Coda file system. ACM Thirteenth Symposium on Operating Systems Principles, in Operating Systems Review 25, 5 (December, 1991) pages 213-225. Also published in ACM Transactions on Computer Systems 10, 1 (February 1992) pages 3-25.

by J. H. Saltzer, April 22, 1996, updated 1998, 1999, 2001, 2002, and April 29, 2003.

(From Hari) In managing the private cache of a client, Coda maintains a consistency invariant. What is that consistency invariant? (When you open() a file, what you will read is the result of the last close() on the file by any client.)
What is the alternative, and why does it matter? (Another possibility is that when you read() from a file, what you will read is the result of the last write() to that file by any client. That is harder to do. The Coda strategy allows deferring write-back till close time. You don't even have to tell the server that you opened the file for writing; just wait till close time and do everything in a single conversation with the server then. This choice of consistency semantics minimizes conversations with the server.)
The paper refers to "callbacks" as the basis for cache coherency in Coda. What does that mean? (Each client has a private cache. Whenever a client caches a copy of a file the central server notes that fact. When a client writes to a file, the write actually goes just to the local cache. Then, when the client finally closes the file the client file system writes the modified file back to the server; the server not only stores the modified file at the central site but it also sends a message (known as a "callback break", for reasons that aren't completely clear) to every other client that has a cached copy of that file, thereby alerting the other clients that their cached copy is now invalid.)
What if another client is writing the file at the same time? (Only one client at a time can close() the file. The first one to call close() wins; the other gets a callback, effectively saying "you lose". The hope is that shared writable files aren't very common, so lossage will be rare.)
Downside to this strategy? (The server is required to maintain quite a bit of per-client state. Keeping that state consistent with the actual state of the client is hard, because clients should be assumed to be undependable--they may forget their own state. For example, how concerned should the server get if a client doesn't acknowledge a callback? The client may have crashed, or it may just be temporarily disconnected.)
The paper refers to "whole-file caching"? What is that? (When a client reads a file from the server, the server insists on sending the whole file, not just the part the client was interested in. And the client cache holds the whole file. Similarly, when a modified file is written back to the server, the whole file is transmitted.)
What if there is a weird crash or the network goes down in the middle of writing a modified file back? Doesn't that mean that someone else may read a partially modified file? (On write-back, Coda creates a shadow file; only after all the data has made it across the network does the server do an exchange of the shadow file with the original. So the write-back operation is atomic. From Satya, 4 May 1999)
Advantages? Disadvantages? (Advantages: a sweeping simplification, especially of the user's mental model of what is available when disconnected. Also, simplicity of failure modes: discovery that a file is not available locally, and thus must be retrieved--or won't be available, because we are running in disconnected mode--can occur only at open time, never when reading or writing a piece of the file. Disadvantage: large files soak up bandwidth and cache space, perhaps unnecessarily. The server may send a 300 Mbyte file across the net when the client intends only to read and change a 10-byte chunk. Remote users who dial in with 300-bit/second modems may find this annoying.)
Athena uses the Andrew File System (AFS), which also uses whole-file caching. MITNet partitions occasionally, and your workstation loses contact. But when MITNet gets its act back togeether, the AFS client in your workstation seems to have no trouble carrying on from where it was. In fact, the AFS server can even reboot; the AFS client on your workstation simply waits till AFS comes back to life. So how serious is this disconnection problem anyway? Why do we need Coda? (The distinction is between "occasionally disconnected" and "usually disconnected". The AFS client can handle occasional disconnections due to failures, by holding up all access to shared files till the disconnection is over. That strategy doesn't gracefully handle the "usually disconnected" operation that applies if you want to work with your laptop on an airplane.)
Why does "placing functionality on clients" improve scalability? (First, note the use of an important-sounding but incorrectly used word in that phrase; they are actually "placing function on clients". Back to the question: The client/server model tends to have an intrinsic incommensurate scaling problem. As the number of clients grows, the number of client cycles available to harass the server grows in proportion. The number of server cycles available is fixed, but the number of server cycles needed grows with the number of clients; at some point you have to buy a faster server. And there are limits to the fastest server you can buy. So if for any specific client/server transaction you can move function from the server to the client, you can take advantage of all those client cycles to postpone the day of reckoning.)
Section 3.4 talks about optimistic and pessimistic replica control, and of "conflict". What "conflict" do these techniques attempt to control? (I carry my portable away and while on the plane I rewrite chapter five of the 6.033 notes. When I come back and reattach I discover that Kaashoek has also rewritten chapter five. Or, I might prepare a new file consisting of an index to chapter five, and come back to find that Kaashoek rewrote chapter five while I was gone and all my effort was wasted. [Note: Coda can't help for this second problem! Why not?])
But these seem like intrinsic problems. (Yes, but if I hadn't been disconnected, the system could have warned me that Kaashoek had expressed intentions of updating the file containing chapter five.)
How does pessimistic replica control deal with conflicts? (Before disconnecting you have to identify every file that you might read and every file that you might modify. The server marks each as "checked out" to you and for the duration of your disconnection it refuses to let anyone else modify any of the files you said you might read and it refuses to let anyone else read any of the files you said you might modify. Something similar has to happen with directories--you say which ones you will read/modify, and noone else can modify/read them while you are gone. Note that modification includes renaming of files and it even ought to include updating an inode to show when you last read the file. This last detail is sometimes ignored, with surprising consequences later when cache algorithms try to depend on the time-last-used.)
Downside? (1. The set of files that you may potentially read or write is probably much bigger than the set of files that you will actually read or write. Each declaration of intent has the effect of denying access to others, and not declaring intent has the effect of denying access to you. So it forces some hard choices, some of which you may not know how to make. 2. While on your trip the president of the company discovers a business opportunity that requires immediate action but that also requires access to a file you have checked out. You are on a beach in Bermuda with no telephone. Now what? 3. While on your trip you accidentally drop your laptop over the side of the cruise ship. Now what?)
How does optimistic replica control (What, me worry?) deal with conflicts? (By hoping they won't happen, and patching things up later if there really is a conflict. It actually isn't hoping they won't happen, it is betting that that real conflicts will be uncommon. The statistics at the end of the paper suggest that this bet looks pretty promising.)
That implies an extra requirement on the system design. What is it? (The system must be able automatically to *detect* conflict. This observation leads to the insight that the two ways of dealing with conflict are fundamentally different: pessimistic control *prevents* conflicts. Optimistic control *detects* conflicts, and fixes things up later.)

In what situations does optimistic control make sense, and in what situations does pessimistic control make sense?

                                            be
                     be optimistic if     pessimistic if
----------------------------------------------------------------

gap between what you        large              small
might read/write and
what you will actually
read/write while
disconnected

chance of conflict while
disconnected                 small            large

cost of patching              low              high
things up
later
----------------------------------------------------------------

What can go wrong with optimistic replica control? (1. It may be hard/expensive to reconcile different copies. 2. Anacronisms can arise. One person can receive a printed receipt saying "it is 2 p.m. 15 seats are left." while another receives a printed receipt saying "it is 1 p.m., all seats are sold." 3. Since the user wasn't forced to list the files to be used during disconnection, he or she may board the ship to Bermuda without a copy of something important.)
How does Coda try to make sure you have the right files available when you disconnect? (The hoarding algorithm manages the disk buffer cache with the possibility of disconnection in mind. It allows you to explicitly list files and directories that will be needed when disconnected, and it tries not to choose them as victims when looking for cache space. If the cache isn't big enough to hold everything you need you can label different things with different priorities. You can run a trace to see what files you have actually been using. Finally, the underlying mechanism is an LRU cache, so recently-used files that you may have forgotten to mention will be available.)
It says that when I am disconnected, I am in "emulation" mode and one of the things emulated is protection. How can my laptop emulate the protection that the server would have provided? The server shouldn't trust the laptop to do anything right. (If you have read, but not write access to a file, and you try to open it for writing, the Coda client emulation mode gives you the same error return that you would have gotten had you been connected. The purpose is that you will notice that you are making a mistake. This doesn't prevent you from prying the top of the portable off with a can-opener and directly modifying the file. Protection against that possibility is provided by the server when you reconnect and go into reintegration mode: it will refuse to update its copy to match yours. Emulated protection is thus an example of a *hint* rather than an absolute: it is giving you a warning of what will happen later, when you reintegrate.)
During emulation mode, how does the Coda client handle a cache that is begining to clog up with modified (dirty) files? And why are they taking up more space than before they were dirty, anyway? (The thing that takes up more space is newly-created files, and files that have grown in size. Once it has used up all the empty space in the cache, it begins discarding unmodified files, lowest priority first. Here is where those priority designations probably start to matter. Because it also keeps a log, if things get really clogged up, it can discard modified files, too. But if the log is full, the paper says Coda "is not very graceful".)
Suppose the portable has done nothing but create some new files. What should happen? (They should be copied onto the server, unless someone has already created a file on the server with the same name.)
If they have, what do we do? (Ask the user for guidance. Reintegration with optimistic replica control sometimes requires manual intervention.)
Suppose the portable has modified a file. (If we are sure that the copy on the server is the original, then we should replace it with the new version from the portable.)
So when do conflicts arise? (Only when the portable changes a file AND someone changes the same file on the server.)
So if it worries only about write-write conflicts, why not just choose the version that has the newest timestamp and make that the authoritative version? (If only one of the copies was modified, that is exactly the right thing to do. But if both copies were modified the user is going to have to inspect them and figure out how to reintegrate. And we need a way of detecting that.)
How can we detect that both copies were modified? (Coda creates something called a storeid which is a nonce for every update to a file. Whenever it modifies a file, it creates a new storeid for the file. And when a disconnected client modifies a file, it logs, in addition to the new file contents, both the previous storeid and the new one. When it later reconnects it runs through the log of changes. As part of the proposal to update the server version of this file, it mentions the previous storeid in the log. If that previous storeid doesn't match the storeid now on the server, that means that someone else changed the server's copy while we were disconnected and we have detected a conflict. If we detect a conflict, we have to refer the problem to the user to resolve by hand.)
Suppose the client has created file X in directory Y while disconnected, and someone else has created file Z in directory Y. Both copies of the directory have been changed, so that looks like a write/write conflict, doesn't it? Does Coda have to ask for manual resolution of that case? (No.)
But why not? (Because it is a file system, Coda knows the semantics of directory operations, so it can figure out which ones are compatible with which others. Creation of two different files with different names doesn't present a real conflict, even though it does represent a modification to both copies of the directory. But Coda doesn't know anything about the application semantics of your personal files, so it has to refer all apparent conflicts to you to resolve. In principle, you could write a program to automatically handle other specific cases. For example, the semantics of mail readers, adsress books, and calendar programs are precise and predictable, so automated reintegration of their databases is fairly easy.)
What about my earlier problem that I prepared a new file containing an index to chapter five, but Frans rewrote chapter five while I was gone? (Coda declares that it just won't solve that problem, because UNIX doesn't try to handle the problem even when everyone is attached. This is a bit of a cop-out, because if I were connected, I probably would have noticed that chapter five was being updated.)
In section 4.5.2 it says, about reintegration, "If the comparison [of storeid's] indicates equality for all objects, the operation is performed..." and if it doesn't "In the case of a store of a file, the entire reintegration is aborted." So a single write-write conflict is enough to force the entire reintegration to user review. Should Coda have instead gone ahead and done as much reintegration as possible? (This one gets pretty deep. We asked Satya what the trade-offs were, and on April 29, 1998, he replied...

2. Treatment of reintegration failure has changed.
In the 1991 version of the system, the entire reintegration (all files) was aborted if any part of it failed (the implementation left server state cleanly in the before-attempted-reintegration state). The failed reintegration was bundled up as a tarball (called a "closure") and left in a standard place on the local disk of the client attempting the reintegration. Coda cache state was then flushed and refetched as necessary to match server state.
The rationale for the original "all-or-nothing-within-a-volume" design was as follows:
- an update conflict indicates a higher than average probability that undetected read/write conflicts exist between other operations in the log and operations performed during disconnection at the server. Hence, it is "safer" to reject the whole reintegration of the volume and force manual repair.
- there were so few failed reintegrations in practice that it wasn't worth the sweat to implement anything fancier.
3. Some time in 1994-95, the extension of Coda to exploit weak connectivity (see Mummert et al in the 1995 SOSP) forced us to rethink some aspects of reintegration. In particular, we found it an annoyance to have a reintegration fail after many minutes of use of a slow link, and to discover that absolutely none of the updates had been propagated to the server. This was particular annoying because a common reason for reintegrating over a slow link is to make sure that work has been backed up on a server (so that a client hard disk crash, theft, loss, etc. doesn't cause loss of many hours of work).
We therefore changed reintegration to work as follows: when a conflict occurs, the entire prefix of the CML before the conflict is applied and an integer indicating the point in the CML at which the conflict occurred is returned (once a conflict is detected, the server doesn't bother proceeding further down the CML). This ensures that (a) each reintegration attempt propagates as much "good" work as possible and that (b) the client knows where the earliest conflict in the CML lies. It can try to deal with this conflict (for example by invoking an Application-Specific Resolver on the client) and, if successful, retrying the reintegration from that point on.
[Note: for non-technical reasons, the invocation of ASRs after failed reintegration has not been implemented yet; it was implemented a long time ago for conflicts detected from server replication; the extension poses no real technical problem --- just sweat]
[Note: I think this is yet another example of the "End-to-End Argument" in action :-) When connectivity was LAN, partial reintegration was not worth the sweat --- en masse retry was fine. When connectivity is weak, the cost of en masse retry is prohibitive; partial retry is needed to ensure an acceptable rate of progress. Much like asking whether an ack for the entire file is fine or whether chunks of it have to acked]
A further change we made on the client was the way in which the reintegration failure is reflected. Rather than creating a closure, we represent the conflict "in-place" as a dangling sym link (just as for conflicts arising from server replication). So a reintegration that fails due to a conflict on an object results in the following: updates before the conflict are safely reflected on the server; the conflicting update is marked as such on this client (but not anywhere else); updates after the conflicting update look like un-reintegrated state to this client.
)
(Suggested by Frans Kaashoek) For years people have used floppy disks to address the mobility problem. Users can mount the file system stored on the floppy to their laptop portable, their desktop machine, or any other convenient computer. What are the differences between this floppy model and Coda's disconnect/reconnect model?
(Suggested by Steve Ward) I think that a great way to view file synchronization (to coin the common misnomer) is via the lattice of derived versions. In the general case (where many machines can copy, modify, and sync with each other at random) I believe its sufficient to maintain enough history to guarantee that the latest common ancestor can be identified at any sync. Little example:
```
                    Server   Laptop1  Laptop2
Starting file:         A
L1 syncs with S:       A       A
L1 edits A:            A       A1>A
L2 syncs with L1:      A       A1      A1
L2 edits               A       A1      A2>A1
L2 syncs with S        A2      A1      A2
Then:
L1 syncs with S        A2      A2      A2
(works OK, since A2>A1>A is consistent with both prior states)
But, if instead of the last L1 syncs:
L1 edits               A2      A3>A1   A2
L1 syncs with S        A2*     A3*     A2
(shows a conflict since A2>A1>A & A3>A1>A are inconsistent) 
```
If there are only 2 machines, then it is sufficient to remember current versions and versions at last sync. In the general case, however, one has to maintain more history, since syncing A with B raises a conflict unless A is an ancestor of B or vice versa. Somebody must know the minimum history requirement to make this determination, but the last common ancestor rule seems sufficient. Its not clear how timestamps generalize to this situation.
(Suggested by Larry Rudolph) If Alice and Bob both modify file F while disconnected, then before giving up, the server could try the following: First apply Alice's modifications and then Bob's. Compare the resulting file with the file that results from first applying Bob's modifications and then Alice's. If the two files are equal, then Coda can safely keep the new updated file. Otherwise, the conflict has to be resolved by a human. Doesn't this always produce correct results?
(Yes, but the way Coda is organized, it would help only rarely.
Section 4.4.1 of the paper points out that since Coda uses whole-file caching, what is logged is what would be outbound from the cache at close time (the whole file), rather than the list of detailed writes that went in to the cache. So what the reconciliation algorithm has to work with consists of the original file and two log entries. Each log entry contains a complete copy of the changed file. Rerunning those two logs in one order will give you Alice's result, while rerunning the logs in the other order will give you Bob's result. The algorithm doesn't make a mistake, but the only time it reports that the conflict doesn't need human resolution is when Alice and Bob made exactly the same change.
It would be necessary to make a gross modification to the emulation logging system, to log the traffic on the other side of the cache (that is, each write operation) in order for the algorithm ever to discover resolvable conflicts.)
(Suggested by Ron Rivest) Compare Coda reintegration to Palm "hotsynch" operation.
- all Palm disconnections are anticipated
- on-line (wireless) is starting to be common
- Linux PDA's are probably just around the corner
(Suggested by Ron Rivest) Compare Coda to CVS version control system, and perhaps note advantages CVS has for shared files. (CVS discussion can be particularly fruitful, since many students already knew about CVS, and since many folks use it for laptop synching etc.)
- When working on shared files for a project, it is usually a *mistake* to be working on the master files (a la coda when on-line) rather than on a cached copy (a la CVS). The students use CVS in 6.170 and understand this.
- Being explicit about what to cache (a la CVS) is not so different from giving your hoarding profile to Coda.
- CVS gives better support for versioning and reconciling conflicts that Coda appears to.
(Suggested by Ron Rivest) What is the difference between replay log and merely having a "dirty bit" on each changed file in the cache?
(Suggested by Ron Rivest) What happens when a user reconnects and there is an open file? (I'm not sure about this one--what is the answer?)
(Suggested by Ron Rivest) Discuss the effect of changing technology (remote connection at higher data rates, much bigger disks, better connectivity with wireless) on the problem that Coda is trying to solve. Are any parts of the problem vanishing, getting easier, or getting harder? (The biggest changes: the disk on a laptop is now large enough that you don't have to worry about not being able to hold all the files you might need, or all the changes you might make while disconnected. New hoarding rule: take a copy of everything. This approach really encourages optimistic control and discourages pessimistic control. The slowest change; communication data rates are still bottlenecks in many locations, so optimizing the log to make reintegration times short still seems to be a useful idea. But maybe not for long. The bottom line: these days, there are a lot of people who keep their laptop and the desktop computers coordinated by running a "file synchronizer" to reconcile changes to the file system.)
(Suggested by Hari) Do we think that the number and size of conflict-prone files is increasing as rapidly as the size of disks is increasing? (Almost certainly not. First, the need for housecleaning the disk has dropped dramatically in the last few years; at this point, many people do it rarely or never. Second, much of the volume is taken up by things that people don't modify: music, photo, and video files, for example, are added, but once added are not usually modified. So a lot of the space on a large disk contains static files. Second, the rate at which people generate conflicts is probably proportional to their face time with the display and keyboard, and that isn't something that can change dramatically.)
(Suggested by Frans Kaashoek) What is the connection between what Coda does and the concept of serializability? (Dealing with multiple replicas can profitably be thought of as a problem of serializability)
What is the desired behavior of a set of concurrent operations on a replicated shared file? (A serializability view says that we want operations on multiple replicas to behave as if they all took place on one replica.)
Does Coda guarantee serializability? (no)
Does Coda detect when serializability is violated? (yes, that is what the storeid does.)
Ward's dependence graph is a way of deciding whether or not it is possible to serialize the operations of the different laptops.
Comments and suggestions: Saltzer@mit.edu