6.033 2011 Lecture 24: Secure channels

recall overall goal: tolerate adversaries.
  previous lecture: case study of how a browser deals with adversaries.
  last lecture's threat model assumed adversary did not control network.
  today: dealing with an adversary that has some control over the network.

problem: many networks do not provide security guarantees.
  adversary can look at packets, corrupt them.
    easy to do on local network.
    [ slide: sniffing wireless traffic ]
    might be even plausible over the internet, if adversary changes DNS.
  adversary can inject arbitrary packets, from almost anywhere.
  by the end-to-end principle, deal at the endpoints.
  dropped packets: retransmit (as long as they eventually get through).
  corrupted, injected, sniffed packets: need some new plan.

security goals for messages.
  secrecy: adversary cannot learn message contents.
  integrity: adversary cannot change message contents.

cryptographic primitives.
  Encrypt(ke, m) -> c; Decrypt(ke, c) -> m.
    ciphertext c is usually the same length as m.
    hard to obtain plaintext m, given ciphertext c, without ke.
    but adversary may change c to c', which decrypts to some other m'.
  MAC(ka, m) -> t.
    MAC stands for Message Authentication Code.
    output t is fixed length, similar to a hash function (e.g., 256 bits).
    hard to compute t for message m, without ka.
  common keys today are 128- or 256-bit long.

secure channel abstraction.
  send and receive messages, just as before, but protected from adversary.
  use Encrypt to ensure secrecy of a message.
  use MAC to ensure integrity (increases size of message).
  complication: replay of messages.
    include a sequence number in every message.
    choose a new random sequence number for every connection.
  complication: reflection of messages.
    include a direction flag, or use different keys in each direction.

problem: key establishment.
  suppose client wants to communicate securely with a server.
  how would a client get a secret key shared with some server?

Diffie-Hellman protocol.
  another cryptographic primitive.
  [ slide sequence: DH protocol exchange ]
    crypto terminology: two parties, Alice and Bob, want to communicate.
  main properties of the protocol:
    after exchanging messages, both parties end up with same key k.
    adversary cannot figure out k from g^a and g^b alone!
  different type of cryptographic primitive: public-key cryptography.
    g^a, g^b are public keys (PK): generally ok for anyone to know this.
    a, b are secret keys (SK): adversary should not learn these values!
    these keys are larger than symmetric keys (for the same "security level").
    current rule-of-thumb is 1024- or 2048-bit key length.
  this works well, as long as the adversary only observes packets.

problem: man-in-the-middle attacks.
  [ slide: MITM attack exchange ]
  active adversary intercepts messages between Alice and Bob.
    adversary need not literally intercept packets: can subvert DNS instead.
    if adversary controls DNS, Alice may be tricked to send packets to Eve.
  both Alice and Bob think they've established a key.
  unfortunately, they've both established a key with Eve.
  what went wrong: no way for Alice to know who she's talking to.
    need to authenticate public keys during key exchange.
    in particular, given the name (Bob) need to know public key (g^b mod p).

idea 1: Alice remembers key used to communicate with Bob last time.
  easy to implement, simple, effective against subsequent MITM attacks.
  ssh uses this approach.
  [ demo: remove yuma.csail.mit.edu's key from known_hosts.
	  ssh yuma.csail.mit.edu, accept new public key.
	  ssh yuma.csail.mit.edu, public key accepted automatically.
	  add '127.0.0.1 yuma.csail.mit.edu' to /etc/hosts.
	  ssh yuma.csail.mit.edu, observe error. ]
  doesn't protect against MITM attacks the first time around.
  doesn't allow server to change its key later on.

idea 2: consult some authority that knows everyone's public key.
  simple protocol.
    authority server has a table: name <-> public key.
    Alice connects to authority server (using above key exchange protocol).
    sends message asking for Bob's public key.
    server replies with PK_bob = (g^b mod p).
  Alice must already know the authority server's public key, PK_as.
    otherwise, chicken-and-egg problem.
  works well, but doesn't scale.
    client must ask the authority server for public key for every connection.
    or at least every time it sees new public key for a given name.

idea 3: authority responds the same way every time, so pre-compute responses.
  public/private keys can be used for more than just key exchange.
  additional primitives:
    Sign(SK, m) -> sig; Verify(PK, m, sig) -> yes/no.
    property: hard to compute sig without knowing SK.
    we will denote the pair {m, sig=Sign(SK, m)} as {m}_SK.
    given {m}_SK and corresponding PK, know that m was signed by someone w/ SK.
  new protocol:
    authority server creates signed message { Bob, PK_bob }_(SK_as).
    anyone can verify that the authority signed this message, given PK_as.
    when Alice wants to connect to Bob, need signed message from authority.
  authority's signed message usually called a "certificate".
    certificate attests to a binding between the name (Bob) and key (PK_bob).
    authority is called a certificate authority (CA).
  certificates are more scalable.
    doesn't matter where certificate comes from, as long as signature is ok.
    easy scalability solution: Bob sends his certificate to Alice.
    (similarly, Alice sends her certificate to Bob.)

who runs this certificate authority?
  today, a large number of certificate authorities.
  [ demo: browser CA list in firefox.
	  edit -> prefs -> adv -> enc -> view certs -> authorities ]
  if any of them sign a certificate, browser will believe it.
  somewhat problematic.
    lots of CAs, controlled by many companies & governments.
    if any are compromised or malicious, mostly game over.

where does this list of CAs come from?
  most of these CAs come with the browser.
    web browser developers carefully vet the list of default CAs.
    downloading list of CAs: need to already know someone's public key.
  bootstrapping / chicken-and-egg problem, as before.
    computer came with some initial browser from the manufacturer.
    manufacturer physically got a copy of Windows, including IE and its CAs.
  MIT CA: download from an unprotected web page (not authenticated).
    a better plan: put it on server that has a certificate from well-known CA.

how does the CA build its table of names <-> public keys?
  first: how do we name principals?
    everyone must agree on what names will be used.
    depends on what's meaningful to the application.
    would having certificates for an IP address help a web browser?
      probably not: actually want to know if we're talking to the right server.
      since DNS untrusted, don't know what IP we want.
      knowing key belongs to IP is not useful.
    for web servers, certificate contains server's host name (e.g., google.com).
  second: how to check if a key corresponds to name?
    whatever mechanism CA decides is sufficient proof.
    some CAs send an email root@domain asking if they approve cert for domain.
    some CAs used to require faxed signed documents on company letterhead.

what if a CA makes a mistake?
  [ slide: CA mistakes ]
  whoever controls the corresponding secret keys can now impersonate sites.
  similarly problematic: attacker breaks into server, steals secret key.
  need to revoke certificates that should no longer be accepted.
  note this wasn't a problem when we queried the server for every connection.

technique 1: include an expiration time in certificate.
  certificate: { Bob, 10-Aug-2011, PK_bob }_(SK_as).
  clients will not accept expired certificates.
  when certificate is compromised, wait until expiration time.
  useful in the long term, but not so useful for immediate problems.

technique 2: publish a certificate revocation list (CRL).
  can work in theory.
  clients need to periodically download the CRL from each CA.
  msft 2001 problem: verisign realized they forgot to publish their CRL address.
  principle: economy of mechanism, avoid rarely-used (untested) mechanisms.

[ skipped material below in lecture ]

technique 3: query an online server to check certificate freshness.
  no need to download long CRL.
  checking status might be less costly than obtaining certificate.

integrating secure channels into an application: web browser.
  1. network sniffing: use a secure channel (SSL) to connect to web server.
  2. server authentication: verify name in certificate matches URL host name.
  3. javascript: include protocol (https) as part of the origin.
  4. client authentication: client certs (rare), cookies (https-only flag).
  5. user enters password: ask user to watch for lock icon + origin (URL).

idea 4 (for looking up public keys): use public keys as names.  [ SPKI/SDSI ]
  trivially solves the problem of finding the public key for a "name".
  avoids the need for certificate authorities altogether.
  might not work for names that users enter directly.
  can work well for names that users don't have to remember/enter.
    application referring to a file.
    web page referring to a link.
  additional attributes of a name can be verified by checking signatures.
    suppose each user in a system is named by a public key.
    can check user's email address by verifying a message signed by that key.