6.033 Discussion Suggestions (Kerberos paper)

6.033--Computer System Engineering

Suggestions for classroom discussion of:

S[teven] P. Miller, B. C[lifford] Neuman, J[effrey] I. Schiller, and J[erome] H. Saltzer. Kerberos authentication and authorization system. Section E.2.1 of Athena Technical Plan, M. I. T. Project Athena, October 27, 1988.

by J. H. Saltzer, April 4, 1996, updated March 31, 1998 and April 6, 1999

Kerberos is a good case study of the wide range of considerations that go into designing something that is intended to be secure. It is unlikely that most students will have picked up all, or even a majority, of the issues, partly because this paper only sketches out the main design considerations. It also assumes that the reader comes in knowing why you need such a system.

Because of this consideration, it is unlikely that asking a class to derive Kerberos from its specifications will get very far. It is probably more illuminating to ask "why does Kerberos do X?"

As a case study, it is also a good illustration of an older approach to developing security protocols. Rather than modularly separating authentication and confidentiality and using distinct procedures for each, it achieves authentication by clever use of the same encryption that provides confidentiality. The primary consequence of this difference is that one must study the design very carefully to conclude that it does both functions in a correct, consistent way; there is actually not enough information in this document to come to a conclusion and one must actually look at the code to be sure. (Certain fields need to be in the packet in a specific order for the authentication to be foolproof.) A secondary consequence of the difference is that Kerberos provides only a single key for the inquiring parties to use, rather than providing one key for confidentiality encryption and a second key for authentication, and tying the two keys together to ensure that they belong to the same session.

Redesigning Kerberos to use modular authentication and confidentiality primitives is an interesting exercise, but may be too much to try to undertake in a 6.033 recitation.

Some bugs in the paper.

As it says at the top of page nine, the descriptions of packet contents in this section are simplified, and full detail is in the appendix. Unfortunately, (1) the appendix isn't included in the version handed out to the class, and (2) in the process of simplifying the scenario descriptions some fields got omitted.

1. All tickets are identical, whether for the ticket-granting service or a Kerberos-mediated service. They always contain both the service name and the workstation identifier.

2. All responses from the TGS are identical, they always contain a ticket lifetime.

The specific bugs are that the service ticket shown in numbered item 2 of scenario III on page 11 omits the service name and the workstation identifier, and the KTGSr --> WS message omits the lifetime field.

Why a key distribution center (KDC) at all? What is the alternative, assuming one is designing a symmetric-key system? (Pair-wise symmetric keys. For Alice to talk to Bob, each must have a copy of K_ab. For Alice to talk to any of 10,000 other people at M.I.T., she must have a list of 9,999 such keys. Alice would probably keep a much shorter key list, of just regular correspondents, so a name discovery and key exchange mechanism must also be available. The fundamental problem with pair-wise keys is providing a secure name discovery and key exchange mechanism. If Alice meets Charles in the hall, they can exchange principal identifiers, but this is not an opportune instant to generate and exchange a randomly-chosen key. And if Alice discovers the need to send a message to Charles by reading one of Charles's papers and seeing an e-mail address at the bottom, it is even harder to arrange to create and exchange a new key.)
Why not use trans-encryption? That is, we create a central agent, which maintains a single list of all principals and their associated keys. If Alice wants to send a message to Charles without any prearrangements, she encrypts the message in her own key and sends it to the central agent. The agent decrypts it using its copy of Alice's key, then encrypts it using Charles's key and sends it along to Charles. (Every byte of private information must flow through the central agent, so it is a high-volume production operation and at some level of traffic a potential bottleneck. In addition, Alice's message to Charles is for a moment exposed in cleartext. It is probably a bad idea to try to achieve high security--needed for all the keys as well as for the cleartext user data as it goes through the agent--in a high-volume production server.)
Why is that bad? (Here is a good example of paranoid reasoning in action: Because a high-volume production server would be expected to require quite a bit of wizardly attention, to maintain its performance, to upgrade it as rapidly as new hardware or operating systems become available, to adjust its configuration to meet changing demand, etc. In contrast, managing a secure service should be done very conservatively, dragging one's feet on installing new system releases, checking and double-checking every proposed change, under supervision of a small, trusted, paranoid staff.)
[Before proceeding, it might be a good idea to sketch the packet flow of a general key distribution service on the board: a packet goes from client Charles to the KDC asking to talk to data warehouse service W. KDC fabricates and sends back two copies of a temporary session key T_cw, the first enciphered in the key of Charles and the second enciphered in the key of W. Charles then can send a message to W enciphered in T_cw, along with the copy of T_cw that is enciphered with W's key. This scheme as described so far acts as an introduction service--it gets a nonce key privately to previously out-of-touch correspondents--but it lacks authentication and replay protection. Kerberos can then be described as a beefed-up version of this basic key-distribution protocol, with authentication and replay protection added.]
Why run the user's password through a one-way encryption algorithm to create the user's key? Why not just use the password as the key? (The user chooses his or her own password, and may have chosen the same password for use on Prodigy or a bank ATM. Applying a one-way transformation to the password helps assure that if the KDC is accidentally compromised, you don't have to also change your Prodigy password.)
Why does Kerberos include the principal identifer of the service inside its encrypted response? Doesn't the client know what service it asked for? (By explicitly including the service identifier inside the encrypted packet, that identifier is securely associated with the ticket, so the client does not have to assume that this ticket goes with this service. This inclusion is an example of the principle "be explicit".)
How could one exploit a protocol that didn't explicitly associate the service name with the ticket? (Lucifer could attack as follows: intercept Charles's original request, modify it to be a request for a ticket for Lucifer's service, then send it along to the KDC. The response would go back directly to Charles, who would be unaware that the ticket in the response is for Lucifer's service. If Lucifer can intercept further packets from Charles, Lucifer can pretend to be the data warehouse service.)
Where else does Kerberos use explicitness? (Every place that two or more fields are enciphered together. For example, the ticket contains the client's name, so that the server knows for sure who the session key is associated with.)
Is it safe to send the ticket for W back to Charles? Wouldn't it be safer to send it directly to the server? (Given that the ticket has to cross the network to get to W, we can regard Charles as just one more network forwarding agent; the ticket is safely enciphered and it isn't any less secure for this extra hop. And by sending the ticket to Charles, he can package it with the request to W, thereby eliminating the need for W to figure out which ticket goes with which request.)
This thing seems more complicated than necessary. Why is there a Ticket-Granting Service (TGS)? Why not just ask Kerberos for the ticket you want in the first place? (With that design, each time you get another ticket, you would have to use your password to decipher the ticket. If one could predict all the tickets that one would need during a session at the beginning, that would be OK. But that is unrealistic, so a dilemma arises: either the workstation stores a copy of your password for the duration of the session, or it stops and asks you to type it in each time you invoke a different network service. So the fundamental reason for introducing the TGS is to allow the workstation to acquire your password exactly once and then destroy it as soon as possible.)
Let's try to simplify another aspect of Kerberos. Why not omit the authenticator that is included with a ticket? Why not send just the ticket to the service? (This point is critical. Rather than using the procedure described in lecture of sending a string of data plus a separate key-driven authenticator, Kerberos uses an ad hoc self-authenticating technique, in which the encrypted packet is assumed authentic if it is internally self-consistent. The authenticator does two distinct things, both of which may be essential, depending on the requirements of the service:
- it carries a timestamp. This assures that this request is current, not a replay from yesterday.
- it repeats the client's principal identifer--this repetition allows the service to verify that the session key found in the ticket is valid. Without this check, there is no way to distinguish a valid ticket from a random bit string.)
Aren't both of those things always essential? (Services that don't maintain dynamically changeable state sometimes don't require protection against request replays, so the timestamp may not be important. A domain name service is an example of one that could get along without this part of the authenticator. However, it would probably still include a time-stamp in its responses, because its clients may be concerned about reply replays.)
What is the essential difference between this scheme of authentication and the key-driven authenticator described in lecture? (The scheme described in lecture systematically protects against yet another attack, called slicing. In a slicing attack, Lucifer intercepts packets encrypted from different sessions, cuts a segment out of one packet and pastes it into another packet. Depending on the particular encryption scheme being used, it is entirely passible that the recipient of this doctored packet may accept the result as authentic. Kerberos prevents slicing attacks by carefully arranging individual data fields of the packet so that they cross boundaries between encryption blocks. The intent is to make it difficult for Lucifer to find a place where cut-and-paste won't be detected. The paper doesn't mention this defense at all, which has led some people to assume that Kerberos is susceptible to slicing. No one has identified a workable slicing attack on Kerberos, but it is a challenge to verify that it can't be done.)
But services that do maintain state need replay protection. In scenario II it says that the authenticator must be "sufficiently recent". But without precise clock synchronization, won't there be a window during which Lucifer can replay a request and get the server to accept it? (Yes; the service has to provide further protection. It works as follows:
- As part of constructing an authenticator, a client inserts a new value for Tcurrent.
- The service requires that that value of Tcurrent be within Tdelta of its own clock. Kerberos sets Tdelta at five minutes.
- The service maintains a history of depth Tdelta of all the authenticators it has received during that time. For each newly received authenticator it searches back through its history (it has to search only as far back as 2* Tdelta; it can garbage collect space devoted to any older history entries) to verify that this is not a replay.
The primary replay protection that the Kerberos authenticator provides is to limit the depth of the history that the server needs to keep. Alternatively, if dynamic replay attacks are unlikely, the server can just ignore the problem and still be fairly well protected.)
How was the value of five minutes chosen for Tdelta? (The choice of the authenticator expiration period is a tradeoff; if it were longer servers would need to keep a deeper history to avoid replay attacks. If it were shorter the clocks would have to be more carefully synchronized. With Tdelta set to five minutes, there is a ten-minute window, which is large enough to allow clocks to be set by hand and clock-setting errors big enough to cause trouble are likely to be noticed.
Why not use a clock synchronizing protocol to make the window much smaller? I have heard that good protocols can adjust clocks to within a few dozen milliseconds, even across the Internet. (That is certainly a possibility. But one would then need a certifiably authenticated clock-setting protocol, which really escalates the problem. And since the window can't be made negligibly small, the application would still have to maintain a list--albeit perhaps much shorter--of recent requests.)
I have heard that Kerberos version five somehow eliminates replays completely. (Kerberos comes with a library program that an application can use to analyze arriving tickets. The library procedure that comes with version five of Kerberos automatically maintains a history of authenticators received during the last 2*Tdelta minutes and it rejects any duplicates as replays. Placing this function in the library reduces the chance that an application programmer will foul things up. This is an example in which the end-to-end argument has been dominated by concerns about protecting application programmers from their own mistakes.)
The paper says that the Kerberos design assumes that server and client clocks are loosely synchronized. How does security fail if this assumption is not met? (If you can play with the clocks, there is a fairly simple attack scenario:
1. 6.033 student makes a recording of an encrypted session between Kaashoek and a print server some afternoon when Kaashoek is printing 6.033 Quiz 2.
2. The student comes back late that night, spoofs the network time protocol, and convinces the print server that it is 2 p.m. again.
3. The student plays back Kaashoek's part of the encrypted session, and causes the printer to print another copy of the quiz, for later study.)
Isn't this a pretty severe flaw in Kerberos? (It is definitely a problem, but it is misleading to characterize it as a "flaw" in Kerberos.
Every security component comes with a set of assumptions. Explicitness tells us that those assumptions should be clearly stated, so that when that component is used in a larger system, the system designer knows what the assumptions are. If you, as the designer, violate an assumption, you should expect that security can be compromised. If you don't like some assumption, you can choose a different component that doesn't make that assumption. But it is misleading to characterize a component as flawed if you don't like its assumptions. The term flawed should be reserved for cases where, even after granting the assumptions the component is still not secure.
If the assumptions are unrealistic, perhaps the component is worthless, but that is a different objection. In this case, Kerberos specifies that all of the participants must have clocks that are synchronized to within a certain maximum skew. On this (and a few other assumptions, such as the server is under physical lock and key) various security features rest.
At the time that Kerberos was designed, this assumption was in some sense more realistic than it is today, because clocks were set by hand, and to change a clock required physical capture of the server. Nowadays, many servers have switched over to setting their clocks with an insecure network protocol, NTP. This change explicitly violates one of the explicit assumptions. If a secure clock-setting protocol were to be introduced (perhaps switching to secure GPS, as suggested by Dave Mazieres), then the assumption made by Kerberos would once again be sound, and the attack would fail.
All of this reinforces Ross Anderson's observation that you can too easily construct an insecure system out of secure components. )
When a client first approaches Kerberos to ask for a ticket it includes a timestamp in the request packet, and Kerberos echos the timestamp in the response, apparently to ensure freshness--that the response is a current one. Why does freshness matter in this particular interchange? (Without this protection, Lucifer could intercept the original request packet and send you back the response packet that Kerberos sent you six hours ago, on another occasion when you used this same service. . The main difference between the new response and the replayed one is that the new one has a new session key; Lucifer is forcing you to reuse an old session key, one that he may have by now found a way to compromise, either by cryptanalysis or by dumpster diving. Even without this protection you would probably detect replays of packets more than 8 hours old because the enclosed tickets will have expired. The interesting observation is that Kerberos is engineered to provide protection not just against likely attacks, but also against some quite unlikely ones.)
But shouldn't services protect or destroy old session keys? (Yes, but it is not safe to depend on them doing so correctly. In addition, a session key may be used just to establish authenticity, not for secrecy, in which case the participants in the transaction may not see any need to protect it once they have finished using it. Having introduced the concept of a single-use session key, users will begin to assume that the concept is meaningful, so it is important to make it so.)
How is key distribution done in public key systems? (One still needs a key distribution method, and it is almost as hard as with a symmetric key system. The reason is that if Alice is going to use Bob's public key, she depends on its authenticity, so she must be absolutely confident in the method by which she obtained that key. Although it doesn't have to be kept confidential, one can't handle it casually, in a way that Lucifer might be able to tamper with it.)
Tickets contain a field "WS", the workstation IP address. What good does that do? (WS is the one thing that is probably superfluous in this protocol. It increases the work factor of certain replay attacks slightly (because an attacker has to resort to IP spoofing to make the WS field verify correctly) but that adds little to overall security. For example, if your home computer dials up to check for mail every thirty minutes, and the dialup scheme assigns a new IP address on each dialup, then a ticket used on one dialup is worthless for the next one, and you are forced to retype your password every half-hour.)
Kerberos allows a dictionary attack on your password. How can Lucifer mount such an attack? (Lucifer can send a message to the KDC claiming to be you and asking for a Ticket-Granting Ticket. The KDC will be happy to send back such a ticket, along with a current timestamp and, for explicitness, the name of the ticket-granting service, all encrypted in your key. These last two things are intended to allow you to authenticate the response. But they can be used by Lucifer to verify that he has guessed your password: Lucifer sends this encrypted response over to his 256-way parallel processor and starts trying to decrypt, using every word in the dictionary as a potential password. If your password is in his dictionary, when he tries it, the authenticator will match, and he knows he has found your password. The timestamp and the identity of the TGS, because they can be recognized, are called "verifiable plaintext". Even on a Pentium, it takes only a few hours to try all the entries in a good-sized dictionary. The beauty of this attack is that it requires only a single, apparently legitimate, interaction with the KDC. All of the trial encryptions can be done in private where noone is watching or logging traffic or access failures. This situation should be compared with that of an ATM, where in order to try a candidate PIN an attacker has to send a message to the bank, which after three failures will confiscate the ATM card.)
Can this problem be fixed? (Yes. The trick is to redesign the Kerberos protocol to ensure that there is no verifiable plaintext in that returned packet. Eliminating verifiable plaintext while maintaining freshness is a bit challenging; read the following paper for details of one such design. Li Gong, Mark A. Lomas, Roger M. Needham, and Jerome H. Saltzer. Protecting poorly chosen secrets from guessing attacks. IEEE Journal on Selected Areas in Communications 11, 5 (June,1993) pages 648-656.)

Comments and suggestions: Saltzer@mit.edu