Replay Prevention in Authentication Systems

Network authentication systems must prevent the replay of old communications. A previously legitimate message, such as the request to delete a file, can potentially cause significant harm if repeated even a few minutes later.

There are two ways to prevent replay: First, secure network protocols can require the exchange of freshly generated random strings, or nonces. Old messages cannot contain subsequently generated random data, and will therefore contain the wrong nonces if replayed. Alternatively, servers can simply record every message they receive in a replay cache, and discard any duplicates. To prevent replay caches from having to grow without bounds, messages should include timestamps. If servers automatically reject messages with old timestamps, they can also evict such messages from the replay cache. However, relying on timestamps in this way has the added complication of requiring all machines to have loosely synchronized clocks.

The main advantage of replay caches is that they allow stateless protocols. If, for instance, a client wishes to make an RPC to a server, it can send a single message containing a valid timestamp and immediately receive a reply. Thus, an RPC need only incur the overhead of a single network round trip, and servers don't have to remember clients between RPCs. With nonces, in contrast, a client must first request a nonce from the server, then send the server an RPC containing that nonce. The cost of fetching a nonce can be amortized over multiple RPCs, but this requires the server to remember the latest nonce of each client. Such per-client state can waste quite a bit of space when large numbers of clients each make RPCs infrequently.

Replay caches and timestamps have some disadvantages, however. First, security depends on clocks remaining roughly synchronized. Whatever mechanism sets the system clock must therefore be a trusted part of the authentication system. Second, the task of maintaining a coherent replay cache across multiple processes and programs can be rather tricky, particularly if a server can crash and reboot in less time than the clock tolerance. Indeed, the Kerberos authentication system uses protocols with timestamps rather than nonces. Yet, 10~years after deployment, most Kerberos implementations still don't have a working replay cache. Could such a crucial part of a widely used system really go unimplemented for over a decade if there weren't serious complications involved?

In summary, replay caches can save secure network servers from the need to keep per-client state. Unfortunately, this statelessness comes at the cost of trusting the system clock. Moreover, experience has shown replay caches difficult to implement. A better alternative for most purposes is to exchange nonces in authentication protocols.

Dave Mazières