6.02 - Routing with Failures Lecture #21 Katrina LaCurts, lacurts@mit.edu ====================================================================== Introduction ====================================================================== Recall that we have three unanswered questions regarding packet-switched network: 1. How do nodes compute routes? 2. How do nodes route around failures? 3. How do nodes communicate reliably? On Monday, we answered question 1: Nodes compute routes by running a distributed routing protocol such as distance-vector or link-state. Today, we're going to answer question 2: What happens when parts of the network fail, and how do nodes deal with that? ====================================================================== First, more comparison between distance-vector and link-state ====================================================================== One thing we avoided on Monday was an explicit decision on which routing protocol to use when. We'll be able to do that today, after we analyze them under failure, but let's start with one concrete comparison: advertisement overhead. How many advertisements get sent in each of these protocols? Distance-vector: - Each advertisement is sent only to neighbors => 2L advertisements Link-state: - Each node's advertisement is forwarded by every node on all of its outgoing links => 2L copies of each advertisement are sent => N*2L advertisements So distance-vector is looking much more enticing in terms of the overhead it consumes. Aside: A proper argument should consider the size of the advertisements as well. In distance-vector, they contain N entries, and in link-state they often contain far fewer. Even taking this into account, you will still see that distance-vector consumes less bandwidth than link-state. With just this knowledge, it's possible that distance-vector will be better for large networks than link-state. We shall see. ====================================================================== How do networks fail? ====================================================================== Now, onto failures. Lots of bad things can happen in a network that will lead to a failure: - Advertisements can get lost - Links can go down - Nodes themselves can go down Today we're going to analyze the routing protocols from Monday in the context of these types of failures (particularly in the context of link failures). To do that, we need to be more precise about what we mean for a routing protocol to be correct, and what it means for a routing protocol to have converged. ---------------------------------------------------------------------- Correctness and convergence, formally ---------------------------------------------------------------------- In an ideal, correctly-working protocol, we want the routing tables at every node to reflect the actual network topology. We state that with two properties: 1. Route validity: For any node N, if dst exists in that node's route table, and route_table[dst] = K, then there is a path from N to dst whose first hop is K. If route validity is satisfied, it means that if a destination appears in the routing table, then there is indeed a usable path to that destination, and the routing table reflects a usable path. Here's an example of an invalid routing table: imagine a network that has been disconnected, but this information has not yet propagated. 2. Path visibility: Every router that has a usable path to a destination learns at least one valid route to that destination Here's an example of a routing table that violates path visibility: one where the entire network is connected, but where a router hasn't yet learned a route (say because advertisements are lost). This doesn't violate route validity, because it's not a statement about a route, but caused by the absence of a route in a switch. A network with path visibility but not route visibility would be, e.g., a network where each node's table contains all destinations, but a routing loop exists (see my later example of a routing loop). If both route validity and path visibility hold, the routing protocol is said to have converged. However, it's impossible for us to guarantee that a routing protocol is converged at all times; it takes non-zero time for topology changes to propagate through the network. So we're going to settle for a slightly weaker property: eventual convergence. Eventual convergence means more or less exactly what you think: if eventual convergence holds, then then routing protocol will eventually converge. But, because of failures, we want to be a bit more precise: Eventual convergence: Given some initial state of the network at time 0, and given a time t after which no changes occur to the topology and no routing advertisements or HELLO packets are lost, if the routing protocol converges in some finite amount of time following t, we say the routing protocol has eventually converged. Basically, the state of the network is allowed to change between times 0 and t; links may fail, etc. But if, at some finite point after t, the network converges, then we're happy. Last week we talked about convergence time. This notion still makes sense in the context of failures: the convergence time is the time it takes for a protocol to converge after a sequence of changes have occurred in the network. ---------------------------------------------------------------------- The first step: periodicity ---------------------------------------------------------------------- We've already discussed one method for dealing with failures and cost changes in networks: doing everything periodically. Nodes run the HELLO protocol periodically, nodes send routing advertisements periodically, and nodes perform the integrate step periodically. Doing these things periodically lets nodes discover and react to changes or failures in the network. The first place a failure will be detected is in the HELLO protocol. Suppose A and B are neighbors. This means they're sending HELLO messages between each other periodically; say every s seconds. This means that A should expect to hear from B roughly every s seconds, and vice versa. If that stops happening, then A will detect a failure; either A->B is dead, or B itself is dead. Aside: Because packet loss can happen for reasons other than link failure, A will not declare B dead after missing just one HELLO packet; A might wait for roughly k*s seconds before deciding that B is dead, where k is some small integer > 1. Now you understand how nodes detect failures, and you should also why performing all the steps of routing periodically is necessary. Let's see whether our routing protocols do indeed eventually converge. We'll analyze link-state and distance-vector separately. ====================================================================== Failures in link-state routing ====================================================================== Since we're talking about eventual convergence, we'll assume that changes/failures have happened in the network between time 0 and time t. We're trying to see if there is a time s after t such that the protocol has converged by time s. ---------------------------------------------------------------------- Part 1: no lost advertisements or HELLO messages ---------------------------------------------------------------------- My claim: If no link-state advertisements or HELLO message are lost after time t, the protocol will eventually converge. This is actually very easy to prove: works and that Dijkstra's algorithm 1. There is a time t1 > t such that the HELLO protocol will detect any failures that occurred in (0, t) by time t1. This is a property of the HELLO protocol; you can read more in the course notes about this if you don't understand why this statement is true. 2. At time t2 > t1, all nodes will have received an advertisement from every other node. This is based on my assumption that no advertisements are lost, and that flooding works correctly. 3. At time t3 > t2, all nodes will have the correct routing table because they will all have finished running Dijkstra's algorithm, and Dijkstra itself is correct. Note: if you are thinking that it was silly of us to prove something that seemed so trivial -- of *course* a routing protocol should eventually converge if no advertisements are lost! -- just wait until we get to distance-vector. ---------------------------------------------------------------------- Part 2: lost advertisements ---------------------------------------------------------------------- My assumption at the beginning was, of course, ridiculous: we can't claim that advertisements are never lost. In link-state, we have some built-in resiliency. For instance, if the advertisement that A sent to B was lost, B would likely get another copy from some other neighbor. But let's imagine the worst case: that a node misses enough advertisements that it has an incorrect view of the network. Clearly this might result in a node using a higher-cost path than is necessary. But can anything *that* bad really happen? Consider the following graph: B --- C / | A | \ | E --- D Suppose that D doesn't get a link-state advertisement from either A or E. This means that D's view of the network doesn't include the link between A and E: D's view: B --- C Given this view, D's route to A will be / | via C, i.e.: A | route_table[A] = D->C | E --- D Suppose also that C doesn't get a link-state advertisement from either A or B. This mean that C's view is: C's view: B --- C Given this view, C's route to A will be | via D, i.e.: A | route_table[A] = C->D \ | E --- D So let's suppose C gets a packet destined for A. It will send it on the link C->D. When D receives that packet, it will look up into its routing table, and send the packet on D->C. C will then send the packet on C->D, D will send the packet on D->C, etc. This is an instance of a routing loop. A routing loop happens whenever nodes routing tables are set up such that a packet traverses a path like N1, N2, ... N_k, N1, N2, ..., N_k, etc. (importantly, routing loops need not happen between just two nodes, like my example above). Aside: If there is a routing loop, then at least one switch has an invalid route table. There is always *some* chance that G and D will never learn the correct view of the network, given that packets can be lost. However, given link-state flooding, there is a very (*very*) good chance that both nodes will eventually learn of the links between A and E and A and B, and then the protocol can converge. But in the meantime, could we help things along? At the very least, it's futile for D and G to bounce packets back and forth between them. Not only is it futile, but it wastes resources. So we do the following: to each packet header, we add a hop-limit, or time-to-live field. If a packet's header has a hop-limit of X, it means that that packet is allowed to travel no more than X hops within the network; once it has traveled X hops, if the packet hasn't reached its destination, then it is dropped. The way this is done is by setting a packet's hop-limit initially, and then decrementing it at each switch it travels through. Decrementing the hop-limit is part of the forwarding process. Note that hop-limits are *not* a mechanism to make the routing protocol converge faster. They are a mechanism to make sure we don't waste resources or congest the network unnecessarily while the protocol is trying to converge. ---------------------------------------------------------------------- Summary ---------------------------------------------------------------------- As long as every node in the network receives a link-state advertisement from every other node (via flooding), link-state eventually converges. Given the nature of flooding, it's very, very rare to see routing loops in link-state routing (unless some switch is misconfigured -- which can happen!). Link-state routing is a good way to achieve fast convergence. ====================================================================== Failures in distance-vector routing ====================================================================== Now we come to distance-vector. Nodes running distance-vector don't get a complete view of the topology the way nodes running link-state do. For this reason, it's more difficult to reason about its convergence properties. ---------------------------------------------------------------------- INFINITY ---------------------------------------------------------------------- First we need to know what distance-vector does when a node is deemed unreachable in the protocol. Remember that in distance-vector, a node A's advertisements contain information about all the nodes in the network (that it knows about), not just its neighbors. When a node A has no route to destination B, it will advertise a cost of INFINITY to B. A cost of INFINITY B is interpreted as there being no route to B. Aside: Typically INFINITY is a number like 16. Aside 2: An alternate implementation of distance-vector involves A not including B in its advertisements when it has no route to them. ---------------------------------------------------------------------- Counting to INFINITY ---------------------------------------------------------------------- The reason it's difficult to reason about the convergence of distance vector is because the order in which advertisements are received by nodes matters. Consider this network: A ---- B ---- C (each link has cost 1) Initially, A's routing table is: A: Self, 0 B: A->B, 1 C: A->B, 2 And B's is: A: B->A, 1 B: Self, 0 C: A->B, 1 Suppose the link from B to C fails: A ----B --x-- C The HELLO protocol at B will detect this: B's routing table: A: B->A, 1 B: Self, 0 C: None, infinity If B sends an advertisement immediately to A, it will look like [(A,1),(B,0),(C,inf)] But that may not happen; B's advertisements happen periodically, but not necessarily right after a change. Aside: Even if nodes send advertisements immediately after changes occur, the scenario I depict below can occur (concrete example: imagine that an advertisement is sent immediately, but lost). Moreover, doing so could result in high overhead from sending so many advertisements. So imagine that before B sends an advertisement to A, it receives one from A, which will be: [(A,0),(B,1),(C,2)] In integrating this advertisement, B will find that A has a route to C of cost 2, which means that if B uses A to get to C, that will be a cost of 3 (2 + 1). 3 is better than infinity, so B will update its table: B's routing table: A: B->A, 1 B: Self, 0 C: B->A, 3 Eventually, B will send an advertisement to A with this information. A will see that B's route to C has changed cost, and update its table A's routing table: A: Self, 0 B: B->A, 1 C: B->A, 4 Then B will get an advertisement from A, see that A's cost to C has changed, and update *its* table: B's routing table: A: B->A, 1 B: Self, 0 C: B->A, 5 This process will continue until the costs to C are equal to INFINITY, at which point the nodes will know there is no route to C (because that's how we interpret a cost of INFINITY). This scenario is known as "counting to infinity". This is a problem: distance-vector can take INFINITY steps to converge. So we don't want INFINITY to be too large. At the same time, it must be larger than the cost of the highest-cost path in the network. Bad news! We wanted to use distance-vector for large networks since it used less bandwidth than link-state. But its convergence time can be proportional to INFINITY, and INFINITY will grow as the network grows. Let's try to solve the count-to-infinity problem ====================================================================== Attempt #1: Split horizon ====================================================================== One of the issues with our earlier example was that A was using B to get to C, and yet still advertising to B about its route to C. Split horizon avoids that. If a node A learns about the best route to a destination C from neighbor B, then A will not advertise its route for C back to B. Let's see how that plays out in our earlier example. When the link between B and C failed, we had the following: A's table: A: Self, 0 B's table: A: B->A, 1 B: A->B, 1 B: Self, 0 C: A->B, 2 C: None, inf Before, the next step was for A to send an advertisement to B. But in this network, A learns about all of its routes from B; it won't advertise any of them back to B under split horizon. Thus, the next thing that will happen is that B sends an advertisement to A, and the routing tables become: A's table: A: Self, 0 B's table: A: B->A, 1 B: A->B, 1 B: Self, 0 C: A->B, inf C: None, inf Huzzah! The problem is solved. Or is it? Now imagine this network: D -- A -- B -- C (all links are cost 1) | | ----------- We are only going to care about destination C here, so I'll just include that in the routing table. The starting tables are: @A: C: A->B, 2 @B: C: B->C, 1 @D: C: D->B, 2 Suppose the link between B and C fails, just like before D -- A -- B -x- C @A: C: A->B, 2 | | @B: C: None, inf ----------- @D: C: D->B, 2 Suppose B sends an advertisement out to A and D, but the advertisement to A gets lost: D -- A -- B -x- C @A: C: A->B, 2 | | @B: C: None, inf ----------- @D: C: None, inf Now suppose A sends an advertisement. Because of split horizon, A won't send its advertisement about C to B (it uses B to get to C), solving the issue we encountered earlier. However, A *will* send to D. D -- A -- B -x- C @A: C: A->B, 2 | | @B: C: None, inf ----------- @D: C: D->A, 3 Now suppose D sends an advertisement. Again, because of split horizon, D won't advertise about C to A, but it will to B. D -- A -- B -x- C @A: C: A->B, 2 | | @B: C: B->D, 4 ----------- @D: C: D->A, 3 Now B will send an advertisement about D to A, and A will update its cost to C. D -- A -- B -x- C @A: C: A->B, 5 | | @B: C: B->D, 4 ----------- @D: C: D->A, 3 And we will continue in this loop, counting until all nodes reach infinity. What is different here? Split horizon avoided two-hop routing loops by saying that node A wouldn't advertise to B about a route to C if B was the one providing that route. But it does nothing to solve the problem with larger routes. Above, A appears to have found a route to C that goes through B, while B has one that goes through D. The problem here is that B does not know that this route will eventually also go through A, creating a routing loop. ====================================================================== Attempt #2: Path-vector routing ====================================================================== The real problem underlying both of these examples is that nodes don't have a way to determine whether integrating an advertisement will create a routing loop because they don't know the full path, only the first hop. Path-vector routing solves this. Path-vector extends distance-vector advertisements to include not just the cost but the full minimum-cost path. A node N will integrate an advertisement only if N does not appear on the path. We will not study this protocol in depth, but you should know that indeed, this protocol eventually converges without counting to infinity. Its convergence time is proportional to the length of the highest-cost minimum-cost path times the advertisement interval (the advertisement interval is the amount of time between successive advertisements). What did we lose to gain this nice convergence property? Clearly, the size of advertisements in path-vector are much large than in distance-vector. ====================================================================== Final comparison of distance-vector, link-state, and path-vector ====================================================================== Now we can finally compare all of these protocols. In addition to our comparison from last week, we've found: Bandwidth: Distance vector uses less bandwidth than path-vector, which uses less than link-state Convergence time: Link-state converges faster than distance-vector and path-vector. So how do we choose? In practice: - Distance-vector for very small networks that can guarantee properties about routing loops (e.g., loop-free networks) - Link-state for small networks, typically networks owned by a single company or enterprise. Link-state is bad on larger networks because of the overhead from the advertisements - Path-vector across larger networks, e.g., across the Internet ====================================================================== The real world ====================================================================== Some of you have asked me some good questions: how does all of this really work on the Internet? Are there, for instance, machines that have routes to every single possible other destination in the Internet? In fact, no. The Internet is broken into entities called Autonomous Systems (ASes). These ASes are things like universities (MIT is an AS), ISPs (Comcast is an AS -- Comcast actually owns multiple ASes), large labs or government branches, etc. Within an AS, we use a link-state protocol known as OSPF for routing. For us, this means that we use link-state routing to get from any machine at MIT to any other machine at MIT. But it doesn't give us a way to send data to machines outside of MIT. For that, we use a path-vector routing protocol called BGP to route between ASes. This protocol provides the routes between MIT and other ASes. And this makes sense. Think about driving. Imagine we're all driving from our homes in Boston to New York City. We'd all have different routes initially, depending on our home addresses. But quickly, we'd all end up on the highway, and then we'd all take the same route to New York. We didn't need 200 individual routes to New York: we needed 200 individual routes to get out of Boston, and then just 1 route to get from there to New York. The fact that there is this hierarchy of routing is what makes routing on the Internet scalable. Also, to put this in context of your lives: in your home, your laptop sits behind a router. That router is exchanging routing information with other network routers. When you send data from your laptop, it's sent to that router, which then forwards it on the appropriate path. So now you know: packets go from your laptop to your router, which then uses OSPF to route within your AS, which then uses BGP to route across ASes.