On the Impossibility of Replacing AFS with NFS By Bill Cattey, MIT Information Systems April 25, 1997 Revised 2 February 1998 Abstract One of the primary requirements of the MIT, Campus-Wide, Athena Computing Environment was the ability to provide round-the-clock access to user files in the face of server and network outages. This meant that it needed to be possible move user files from server to server while they were in use. This paper describes the relative impossibility of providing that file migration capability with NFS. 1. Why Migration is a Requirement In the timesharing days, it was simple for the system manager to determine if particular files were in use. Timesharing systems often had scheduled downtime when there was a guarantee that no users would be on-line. The Client/Server computing model is so attractive becaise it presumes great freedom on the part of the user to access servers at any time. The expectation is with high server availability with only small amounts of server downtime. In the ideal Client/Server implementation, services are available 24 hours a day, 7 days a week, and continue to be available even in the event of the failure of an individual server, or even groups of servers. The NFS protocol was designed to provide a simple distributed file service. Its market success is largely due to the ease with which it can be reimplemented on a new client platform. The NFS service model is also easy to understand: Files live on a particular server. The client can easily recover from a server outage, but during the outage, the files are not available. 1.1. Methods of providing high availability filesystems High-availability environments require that user files be available even in the face of server outages. This is in opposition to the NFS model where files can live on only one server. There are three ways to approach the problem of high availability files: 1.RAID 2.Clustering 3.Failover 4.Migration RAID: In a RAID disk implementation, it is assumed that the disk drive is the component that will fail, and that replicating the files across multiple drives will solve the problem. Unfortunately, RAID only addresses the issue of a disk crash and does not deal with the issue of a server crash. Clustering: One way to finesse the issue of high availability is through the notion of a server cluster. In a server cluster, a group of machines cooperate and answer requests as if they were a single server. A cluster NFS server would have a proprietary filesystem that would replicate files appropriately among a shared disk resource, and properly answer requests even if a single server in the cluster went down. Many manufacturers produce clustering systems that allow a wide variety of choices about how to replicate the files, and how to distribute the load among the servers. One important requirement of an NFS cluster server is that all the servers of the cluster would have to pretend to be a single NFS server. This means that the cluster servers all have to be on the same local area network, and addressed by the same IP address. Otherwise the NFS clients have no way to know to ask for files from a different IP address. Failover: In some novel client implementations of NFS, mount points can be to multiple IP addresses. This scheme has two serious limitations: Its robustness depends upon every replication site containing the same data without providing replication tools. Files open during a server failure have to be reopened because the file handle to the failing server cannot be replaced any other way. Obtaining that file handle may require multiple lookup operations if the file is deep in a hierarchy. Migration: In migration, some mechanism ensures consistent replication of user files to another site. The goal is minimize or eliminate any period of downtime for user file access. Migration schemes used by other network filesystems than NFS have been shown to successfully move active files to replication sites or alternate servers correctly and with a minimum of downtime. 1.2. The value of migration Migration is the general solution to the file availability problem. RAID, clustering, and failover, impose a particular policy for location of replicas of files at the time of purchase and deployment. Migration allows evolution of policies, with none of the limitations imposed by any of the other methodologies. Examples: Migration would allow two physically distant computer facilities to cooperate for high availability operation. This makes disaster recovery plans much simpler. Migration would allow many identical, low cost, server hosts to provide more robust high availability file service than a single, expensive, proprietary, cluster server. With file migration, there is no limitation on where the file could live. Servers could respond to different IP addresses, live on different local area networks, or even be in different parts of the world. If you can get it, migration is more functional than any other high availability methodology. It is less expensive than most of them. 1.3. The Ancillary issue of Backup The simple design of NFS made no provision for a vital system management task: Backup! Few people remember that in order to make a guaranteed consistent backup of a UNIX filesystem, no modifications to its content should be allowed for the duration of the backup. There are four aproaches to dealing with this requirement: 1.Scheduled downtime for user files during backups. 2.Backup on a per-file basis, with the prayer that the backup system wins the race between copying the file, and the user changing its contents often enough. 3.Specially designed NFS appliances which have reimplemented the underlying filesystem to permit consistent backup. 4.Ignoring the issue, and hoping that few enough garbage directories or garbage files get restored. The version 3 NFS protocol has added the NFS3ERR_JUKEBOX status which might helpful here. If and only if all clients are running the version 3 protocol, and have all chosen to interpret the semantics of the error to mean gracefully try again later, a server could, during a backup, return the NFS3ERR_JUKEBOX status and thereby gracefully disable access during the backup. Migration functionality allows movement of a user's files while they are in use and without disruption. That capability is more than sufficient to provide a consistent snapshot to cold storage for backup purposes. 2. The Problems with NFS and Migration The same problem with file handles that bites the failover technique prevents NFS from supporting migration. There is no way to instruct an NFS file client to accett a new file handle for an open file. Even if the protocol were changed to allow such acceptance, the work required for maintaining a map from the old file handle to the new file handle is prohibitive. The only way to add the migration capability to NFS would be to add a protocol whereby a service request could be answered by the server with a replacement file handle. How would one obtain a replacement file handle? Although a file handle may refer to a directory, which has structure that could be rederived from a file replica, a file handle most often refers to an open file. The file handle is an opaque data structure which enables a server to locate the NFS file's disk drive and disk blocks through a trivial association. Most often NFS implements the NFS file handle to point directly at the disk and disk blocks that hold the file. In order to map an NFS file handle to a file on another server, one must be able to associate the old value with the new value. There is no protocol by which stale NFS file handles can be gracefully regenerated by the client. The file lookup code in the application may have operated in once-only code, or done the work one directory at a time instead of an easily repeated path look up. here is no requirement that the operating system maintain the entire path information associated with an NFS file handle. In the past, receipt of the "STALE_NFS_FILEHANDLE" status meant an application restart, filesystem remount, or client reboot. There is no NFS file handle expiration protocol. Therefore for interoperation with existing applications and operating systems that would use a new NFS that could refresh file handles, either the operating system would have to be modified to maintain significant additional state per file handle, or the NFS servers would have to take responsibility for supplying updated NFS file handles for an unspecified length of time. This means that either a table mapping the old handle values to the new values would have to be maintained for the life of the files, or NFS servers supporting migration would have to agree on a common representation of file location information to permit a trivial mapping between old and new file handles. Supporting Migration under NFS is an impossibility until both a file handle replacement protocol is added, and the file handle itself is changed to be the same across multiple servers. 3. A security benefit to a new NFS file handle abstraction. Because an NFS file handle refers to the physical disk devices and blocks, and because up until recently NFS relied on a well behaved client to enforce restrictions on NFS protocol requests, it has been possible to write a program that could crack into an NFS host and modify disk blocks on filesystems that were not even mounted! With the advent of NFS based on secure RPC this problem is less severe, but the potential exists to generate NFS protocol requests that run rampant over all the disks in a server because no server side logic imposes an abstraction barier on the NFS file handle. An association table with imposition of access rights could allow an abstract file handle to come in from clients, be access verified, and used to map to the physical devices on the server. Such an abstract handle could be kept THE SAME for files migrated between cooperating servers. Migration and security are achieved in one fell swoop. Since the NFS protocol requires the NFS file handle to be regarded by the client as an opague data structure, the new abstract NFS file handle implementation would protect all servers while requiring absolutely no change to any NFS clients! One wonders why this was not implemented sooner.