Coda File System

by (Raj) Surajit Sarkar
Section 1, Dally/Lew

The current levels of technology in the areas of portable computers and wireless communication have resulted in a great deal of interest and progress in mobile computing. A major issue in mobile computing is the fact that at certain times the client may be temporarily unable to communicate with some or all of the network servers. The Coda file system addresses this issue by supporting disconnected operation.

Portable computers using wireless network links are subject to a number of operating hazards. Issues such as communication signal strength and regional coverage are extremely important. The links are subject to disruption or termination due to low battery power or leaving a covered reception area. Communications can also be severely affected by interference from other electronic equipment or natural elements. In other words, temporary link failures are an integral part of mobile computing. The Coda file system enables the client to continue accessing critical data during these failure periods. The system is based on pre-loading the client's cache with critical data, continuing normal operation until disconnection, logging all changes made while disconnected, and replaying them upon reconnection. The value of uninterrupted operation can be considered potentially invaluable in many circumstances. It is not difficult to imagine the negative consequences of losing or destroying important files, or being forced to sit idly, because of failed communications. Even if the files can be recovered in some form from backups, recent work will be lost, time will be wasted, and users will be frustrated and angry.

First class replication refers to server replications, which are considered to be higher quality, persistent, widely known, secure, complete, and accurate. Second class replication refers to cache copies on clients, which are considered to be inferior in all of the aforementioned areas. The Coda system utilizes both types of replication in order to combine the performance and scalability advantages of second class replication with the quality advantages of first class replication. When operating in disconnected mode, the quality of the client's second class replicas may be degraded because of the inability to revalidate with respect to a first class replica. Server replication preserves data quality during failures, but does not provide the availability that disconnected operation does. Therefore, the Coda system uses server replication to reduce the frequency and duration of disconnected operation, and resorts to disconnected operation and second class replication in order to preserve availability when there are no other options.

Venus is responsible for managing its cache in a manner that balances the needs of connected and disconnected operation. A user who is currently using one set of files may have indicated another set as critical. Venus must cache the current set for performance reasons, but it must also cache the critical set in preparation for disconnection. A prioritized algorithm is used to periodically reevaluate what files should be cached. The algorithm uses recent reference history as well as a per-workstation database that identifies what objects are of interest to that user. As a result, the priorities are established in a manner that balances the needs of connected and disconnected operations. The recent reference history allows the current files to be cached for connected operation. The interest database can be updated and customized by the user to ensure that the appropriate critical files are cached in the event of disconnection. The cache is considered to be in equilibrium when no uncached object has a higher priority than a cached object. Equilibrium may be disturbed by normal activity, and periodically the priorities of cached and uncached objects are reevaluated and fetches and evictions restore equilibrium.

The Coda system successfully addresses one of the biggest drawbacks in mobile computing. Despite the advances in portable computers and wireless technology, link failures are unavoidable. The Coda system allows the client to continue operation uninterrupted by link failures, and it does so in a transparent fashion. The benefit to the mobile computer user can be immense in many situations.