6.02 - Routing with Failures
Lecture #21
Katrina LaCurts, lacurts@mit.edu

======================================================================
Introduction
======================================================================

Recall that we have three unanswered questions regarding
packet-switched network:
1. How do nodes compute routes?
2. How do nodes route around failures?
3. How do nodes communicate reliably?

On Monday, we answered question 1: Nodes compute routes by running a
distributed routing protocol such as distance-vector or link-state.

Today, we're going to answer question 2: What happens when parts of
the network fail, and how do nodes deal with that?

======================================================================
First, more comparison between distance-vector and link-state
======================================================================

One thing we avoided on Monday was an explicit decision on which
routing protocol to use when.  We'll be able to do that today, after
we analyze them under failure, but let's start with one concrete
comparison: advertisement overhead.

How many advertisements get sent in each of these protocols?

Distance-vector:
- Each advertisement is sent only to neighbors
=> 2L advertisements

Link-state:
- Each node's advertisement is forwarded by every node on all of its
  outgoing links
=> 2L copies of each advertisement are sent
=> N*2L advertisements

So distance-vector is looking much more enticing in terms of the
overhead it consumes.

   Aside: A proper argument should consider the size of the
   advertisements as well.  In distance-vector, they contain N
   entries, and in link-state they often contain far fewer.  Even
   taking this into account, you will still see that distance-vector
   consumes less bandwidth than link-state.

With just this knowledge, it's possible that distance-vector will be
better for large networks than link-state.  We shall see.

======================================================================
How do networks fail?
======================================================================

Now, onto failures.  Lots of bad things can happen in a network that
will lead to a failure:
- Advertisements can get lost
- Links can go down
- Nodes themselves can go down

Today we're going to analyze the routing protocols from Monday in the
context of these types of failures (particularly in the context of
link failures).  To do that, we need to be more precise about what we
mean for a routing protocol to be correct, and what it means for a
routing protocol to have converged.

----------------------------------------------------------------------
Correctness and convergence, formally
----------------------------------------------------------------------

In an ideal, correctly-working protocol, we want the routing tables at
every node to reflect the actual network topology.  We state that with
two properties:

1. Route validity: For any node N, if dst exists in that node's route
   table, and route_table[dst] = K, then there is a path from N to dst
   whose first hop is K.

If route validity is satisfied, it means that if a destination appears
in the routing table, then there is indeed a usable path to that
destination, and the routing table reflects a usable path.

Here's an example of an invalid routing table: imagine a network that
has been disconnected, but this information has not yet propagated.

2. Path visibility: Every router that has a usable path to a
   destination learns at least one valid route to that destination

Here's an example of a routing table that violates path visibility:
one where the entire network is connected, but where a router hasn't
yet learned a route (say because advertisements are lost).  This
doesn't violate route validity, because it's not a statement about a
route, but caused by the absence of a route in a switch.

A network with path visibility but not route visibility would be,
e.g., a network where each node's table contains all destinations, but
a routing loop exists (see my later example of a routing loop).

If both route validity and path visibility hold, the routing protocol
is said to have converged.

However, it's impossible for us to guarantee that a routing protocol
is converged at all times; it takes non-zero time for topology changes
to propagate through the network.  So we're going to settle for a
slightly weaker property: eventual convergence.

Eventual convergence means more or less exactly what you think: if
eventual convergence holds, then then routing protocol will eventually
converge.  But, because of failures, we want to be a bit more precise:

  Eventual convergence: Given some initial state of the network at
  time 0, and given a time t after which no changes occur to the
  topology and no routing advertisements or HELLO packets are lost, if
  the routing protocol converges in some finite amount of time
  following t, we say the routing protocol has eventually converged.

Basically, the state of the network is allowed to change between times
0 and t; links may fail, etc.  But if, at some finite point after t,
the network converges, then we're happy.

Last week we talked about convergence time.  This notion still makes
sense in the context of failures: the convergence time is the time it
takes for a protocol to converge after a sequence of changes have
occurred in the network.

----------------------------------------------------------------------
The first step: periodicity
----------------------------------------------------------------------

We've already discussed one method for dealing with failures and cost
changes in networks: doing everything periodically.  Nodes run the
HELLO protocol periodically, nodes send routing advertisements
periodically, and nodes perform the integrate step periodically.
Doing these things periodically lets nodes discover and react to
changes or failures in the network.

The first place a failure will be detected is in the HELLO protocol.
Suppose A and B are neighbors.  This means they're sending HELLO
messages between each other periodically; say every s seconds.  This
means that A should expect to hear from B roughly every s seconds, and
vice versa.  If that stops happening, then A will detect a failure;
either A->B is dead, or B itself is dead.

   Aside: Because packet loss can happen for reasons other than link
   failure, A will not declare B dead after missing just one HELLO
   packet; A might wait for roughly k*s seconds before deciding that B
   is dead, where k is some small integer > 1.

Now you understand how nodes detect failures, and you should also why
performing all the steps of routing periodically is necessary.  Let's
see whether our routing protocols do indeed eventually converge.
We'll analyze link-state and distance-vector separately.

======================================================================
Failures in link-state routing
======================================================================

Since we're talking about eventual convergence, we'll assume that
changes/failures have happened in the network between time 0 and time
t.  We're trying to see if there is a time s after t such that the
protocol has converged by time s.

----------------------------------------------------------------------
Part 1: no lost advertisements or HELLO messages
----------------------------------------------------------------------

My claim: If no link-state advertisements or HELLO message are lost
after time t, the protocol will eventually converge.  This is actually
very easy to prove: works and that Dijkstra's algorithm 

1. There is a time t1 > t such that the HELLO protocol will detect any
   failures that occurred in (0, t) by time t1.  This is a property of
   the HELLO protocol; you can read more in the course notes about
   this if you don't understand why this statement is true.

2. At time t2 > t1, all nodes will have received an advertisement from
   every other node.  This is based on my assumption that no
   advertisements are lost, and that flooding works correctly.

3. At time t3 > t2, all nodes will have the correct routing table
   because they will all have finished running Dijkstra's algorithm,
   and Dijkstra itself is correct.

Note: if you are thinking that it was silly of us to prove something
that seemed so trivial -- of *course* a routing protocol should
eventually converge if no advertisements are lost! -- just wait until
we get to distance-vector.

----------------------------------------------------------------------
Part 2: lost advertisements
----------------------------------------------------------------------

My assumption at the beginning was, of course, ridiculous: we can't
claim that advertisements are never lost.

In link-state, we have some built-in resiliency.  For instance, if the
advertisement that A sent to B was lost, B would likely get another
copy from some other neighbor.  But let's imagine the worst case: that
a node misses enough advertisements that it has an incorrect view of
the network.  Clearly this might result in a node using a higher-cost
path than is necessary.  But can anything *that* bad really happen?

Consider the following graph:

   B --- C
  /      |
A        |
  \      |
   E --- D

Suppose that D doesn't get a link-state advertisement from either A or
E.  This means that D's view of the network doesn't include the link
between A and E:

D's view:   B --- C    Given this view, D's route to A will be
           /      |    via C, i.e.:
         A        |      route_table[A] = D->C
                  |
            E --- D

Suppose also that C doesn't get a link-state advertisement from either
A or B.  This mean that C's view is:

C's view:   B --- C    Given this view, C's route to A will be
                  |    via D, i.e.:
         A        |      route_table[A] = C->D
          \       |
            E --- D

So let's suppose C gets a packet destined for A.  It will send it on
the link C->D.  When D receives that packet, it will look up into its
routing table, and send the packet on D->C.  C will then send the
packet on C->D, D will send the packet on D->C, etc.

This is an instance of a routing loop.  A routing loop happens
whenever nodes routing tables are set up such that a packet traverses
a path like N1, N2, ... N_k, N1, N2, ..., N_k, etc.  (importantly,
routing loops need not happen between just two nodes, like my example
above).

   Aside: If there is a routing loop, then at least one switch has an
   invalid route table.

There is always *some* chance that G and D will never learn the
correct view of the network, given that packets can be lost.  However,
given link-state flooding, there is a very (*very*) good chance that
both nodes will eventually learn of the links between A and E and A
and B, and then the protocol can converge.

But in the meantime, could we help things along?  At the very least,
it's futile for D and G to bounce packets back and forth between
them.  Not only is it futile, but it wastes resources.  So we do the
following: to each packet header, we add a hop-limit, or time-to-live
field.  If a packet's header has a hop-limit of X, it means that that
packet is allowed to travel no more than X hops within the network;
once it has traveled X hops, if the packet hasn't reached its
destination, then it is dropped.

The way this is done is by setting a packet's hop-limit initially, and
then decrementing it at each switch it travels through.  Decrementing
the hop-limit is part of the forwarding process.

Note that hop-limits are *not* a mechanism to make the routing
protocol converge faster.  They are a mechanism to make sure we don't
waste resources or congest the network unnecessarily while the
protocol is trying to converge.

----------------------------------------------------------------------
Summary
----------------------------------------------------------------------

As long as every node in the network receives a link-state
advertisement from every other node (via flooding), link-state
eventually converges.  Given the nature of flooding, it's very, very
rare to see routing loops in link-state routing (unless some switch is
misconfigured -- which can happen!).  Link-state routing is a good way
to achieve fast convergence.

======================================================================
Failures in distance-vector routing
======================================================================

Now we come to distance-vector.  Nodes running distance-vector don't
get a complete view of the topology the way nodes running link-state
do.  For this reason, it's more difficult to reason about its
convergence properties.

----------------------------------------------------------------------
INFINITY
----------------------------------------------------------------------

First we need to know what distance-vector does when a node is deemed
unreachable in the protocol.  Remember that in distance-vector, a node
A's advertisements contain information about all the nodes in the
network (that it knows about), not just its neighbors.

When a node A has no route to destination B, it will advertise a cost
of INFINITY to B.  A cost of INFINITY B is interpreted as there being
no route to B.

   Aside: Typically INFINITY is a number like 16.

   Aside 2: An alternate implementation of distance-vector involves A
   not including B in its advertisements when it has no route to them.

----------------------------------------------------------------------
Counting to INFINITY
----------------------------------------------------------------------

The reason it's difficult to reason about the convergence of distance
vector is because the order in which advertisements are received by
nodes matters.

Consider this network:   A ---- B ---- C  (each link has cost 1)

Initially, A's routing table is:
  A: Self, 0
  B: A->B, 1
  C: A->B, 2

And B's is:
  A: B->A, 1
  B: Self, 0
  C: A->B, 1

Suppose the link from B to C fails:  A ----B --x-- C
The HELLO protocol at B will detect this:

B's routing table:  A: B->A, 1
                    B: Self, 0
                    C: None, infinity

If B sends an advertisement immediately to A, it will look like
[(A,1),(B,0),(C,inf)]

But that may not happen; B's advertisements happen periodically, but
not necessarily right after a change.

   Aside: Even if nodes send advertisements immediately after changes
   occur, the scenario I depict below can occur (concrete example:
   imagine that an advertisement is sent immediately, but lost).
   Moreover, doing so could result in high overhead from sending so
   many advertisements.

So imagine that before B sends an advertisement to A, it receives one
from A, which will be: [(A,0),(B,1),(C,2)]

In integrating this advertisement, B will find that A has a route to C
of cost 2, which means that if B uses A to get to C, that will be a
cost of 3 (2 + 1).  3 is better than infinity, so B will update its
table:

B's routing table:  A: B->A, 1
                    B: Self, 0
                    C: B->A, 3

Eventually, B will send an advertisement to A with this information.
A will see that B's route to C has changed cost, and update its table

A's routing table:  A: Self, 0
                    B: B->A, 1
                    C: B->A, 4

Then B will get an advertisement from A, see that A's cost to C has
changed, and update *its* table:

B's routing table:  A: B->A, 1
                    B: Self, 0
                    C: B->A, 5

This process will continue until the costs to C are equal to INFINITY,
at which point the nodes will know there is no route to C (because
that's how we interpret a cost of INFINITY).  This scenario is known
as "counting to infinity".

This is a problem: distance-vector can take INFINITY steps to
converge.  So we don't want INFINITY to be too large.  At the same
time, it must be larger than the cost of the highest-cost path in the
network.

Bad news!  We wanted to use distance-vector for large networks since
it used less bandwidth than link-state.  But its convergence time can
be proportional to INFINITY, and INFINITY will grow as the network
grows.

Let's try to solve the count-to-infinity problem

======================================================================
Attempt #1: Split horizon
======================================================================

One of the issues with our earlier example was that A was using B to
get to C, and yet still advertising to B about its route to C.

Split horizon avoids that.  If a node A learns about the best route to
a destination C from neighbor B, then A will not advertise its route
for C back to B.

Let's see how that plays out in our earlier example.  When the link
between B and C failed, we had the following:

A's table:  A: Self, 0    B's table: A: B->A, 1
            B: A->B, 1               B: Self, 0
            C: A->B, 2               C: None, inf

Before, the next step was for A to send an advertisement to B.  But in
this network, A learns about all of its routes from B; it won't
advertise any of them back to B under split horizon.

Thus, the next thing that will happen is that B sends an advertisement
to A, and the routing tables become:

A's table:  A: Self, 0    B's table: A: B->A, 1
            B: A->B, 1               B: Self, 0
            C: A->B, inf             C: None, inf

Huzzah!  The problem is solved.  Or is it?  Now imagine this network:

D -- A -- B -- C   (all links are cost 1)
|         |
-----------

We are only going to care about destination C here, so I'll just
include that in the routing table.  The starting tables are:

@A:   C: A->B, 2
@B:   C: B->C, 1
@D:   C: D->B, 2

Suppose the link between B and C fails, just like before

D -- A -- B -x- C        @A:   C: A->B, 2
|         |              @B:   C: None, inf
-----------              @D:   C: D->B, 2

Suppose B sends an advertisement out to A and D, but the advertisement
to A gets lost:

D -- A -- B -x- C        @A:   C: A->B, 2
|         |              @B:   C: None, inf
-----------              @D:   C: None, inf

Now suppose A sends an advertisement.  Because of split horizon, A
won't send its advertisement about C to B (it uses B to get to C),
solving the issue we encountered earlier.  However, A *will* send to
D.

D -- A -- B -x- C        @A:   C: A->B, 2
|         |              @B:   C: None, inf
-----------              @D:   C: D->A, 3

Now suppose D sends an advertisement.  Again, because of split
horizon, D won't advertise about C to A, but it will to B.

D -- A -- B -x- C        @A:   C: A->B, 2
|         |              @B:   C: B->D, 4
-----------              @D:   C: D->A, 3

Now B will send an advertisement about D to A, and A will update its
cost to C.

D -- A -- B -x- C        @A:   C: A->B, 5
|         |              @B:   C: B->D, 4
-----------              @D:   C: D->A, 3

And we will continue in this loop, counting until all nodes reach
infinity.

What is different here?  Split horizon avoided two-hop routing loops
by saying that node A wouldn't advertise to B about a route to C if B
was the one providing that route.  But it does nothing to solve the
problem with larger routes.  Above, A appears to have found a route to
C that goes through B, while B has one that goes through D.

The problem here is that B does not know that this route will
eventually also go through A, creating a routing loop.

======================================================================
Attempt #2: Path-vector routing
======================================================================

The real problem underlying both of these examples is that nodes don't
have a way to determine whether integrating an advertisement will
create a routing loop because they don't know the full path, only the
first hop.

Path-vector routing solves this.  Path-vector extends distance-vector
advertisements to include not just the cost but the full minimum-cost
path.  A node N will integrate an advertisement only if N does not
appear on the path.

We will not study this protocol in depth, but you should know that
indeed, this protocol eventually converges without counting to
infinity.  Its convergence time is proportional to the length of the
highest-cost minimum-cost path times the advertisement interval (the
advertisement interval is the amount of time between successive
advertisements).

What did we lose to gain this nice convergence property?  Clearly, the
size of advertisements in path-vector are much large than in
distance-vector.

======================================================================
Final comparison of distance-vector, link-state, and path-vector
======================================================================

Now we can finally compare all of these protocols.  In addition to our
comparison from last week, we've found:

Bandwidth: Distance vector uses less bandwidth than path-vector, which
uses less than link-state

Convergence time: Link-state converges faster than distance-vector and
path-vector.

So how do we choose?  In practice:
- Distance-vector for very small networks that can guarantee
  properties about routing loops (e.g., loop-free networks)
- Link-state for small networks, typically networks owned by a single
  company or enterprise.  Link-state is bad on larger networks because
  of the overhead from the advertisements
- Path-vector across larger networks, e.g., across the Internet

======================================================================
The real world
======================================================================

Some of you have asked me some good questions: how does all of this
really work on the Internet?  Are there, for instance, machines that
have routes to every single possible other destination in the
Internet?

In fact, no.  The Internet is broken into entities called Autonomous
Systems (ASes).  These ASes are things like universities (MIT is an
AS), ISPs (Comcast is an AS -- Comcast actually owns multiple ASes),
large labs or government branches, etc.

Within an AS, we use a link-state protocol known as OSPF for routing.
For us, this means that we use link-state routing to get from any
machine at MIT to any other machine at MIT.  But it doesn't give us a
way to send data to machines outside of MIT.

For that, we use a path-vector routing protocol called BGP to route
between ASes.  This protocol provides the routes between MIT and other
ASes.  And this makes sense.  Think about driving.  Imagine we're all
driving from our homes in Boston to New York City.  We'd all have
different routes initially, depending on our home addresses.  But
quickly, we'd all end up on the highway, and then we'd all take the
same route to New York.  We didn't need 200 individual routes to New
York: we needed 200 individual routes to get out of Boston, and then
just 1 route to get from there to New York.

The fact that there is this hierarchy of routing is what makes routing
on the Internet scalable.

Also, to put this in context of your lives: in your home, your laptop
sits behind a router.  That router is exchanging routing information
with other network routers.  When you send data from your laptop, it's
sent to that router, which then forwards it on the appropriate path.
So now you know: packets go from your laptop to your router, which
then uses OSPF to route within your AS, which then uses BGP to route
across ASes.