6.02 Lab #9: Routing

Due date: Fri, 5/1, at 11:59p

Goal

Using NetSim, a simple network simulator, develop and experiment with distance-vector and link-state routing protocols.

Useful links

Instructions

See Lab #1 for a longer narrative about lab mechanics. The one-sentence version:

Complete the tasks below, submit your task files on-line before the deadline, and complete your check-off interview within a week of the file submission deadline.

As always, help is available in 32-083 (see the Lab Hours web page).

For this lab, you will need wxPython (2.8.x), which has already been installed on the athena machines. To install it on your computer, visit http://www.wxpython.org/

Introduction and preliminaries

This lab uses NetSim, a simple packet-level network simulator. You will write the code for the main components of distance-vector and link-state routing protocols.

NetSim executes a set of steps every time slot; time increments by 1 each slot. During each time slot a link can deliver one packet from source of the link to the destination of the link.

You can run the python programs for this lab using ipython or python (ipython602 and python602 on the athena machines). The lab will not work in IDLE.

To understand the different parameters one can set in NetSim, go to a shell and enter:

# python lab9_dv.py -h

(Or, in ipython, run lab9_dv.py -h)

This command prints out the various options:

-n NUMNODES, --numnodes=NUMNODES -t SIMTIME, --simtime=SIMTIME -r, --rand

The "-r" option generates a random topology with the specified number of nodes (the default number of nodes is 12 and the default simulation time is 2000 time slots).

This lab has two main tasks, each with a few sub-tasks. The first task is to implement a distance-vector (DV) routing protocol. The second task is to implement a link-state (LS) routing protocol. You will test these implementations on a few different topologies that will be generated when you run the corresponding task files.

Both routing protocols should construct the minimum cost path from the Router to all the other destination addresses that are currently reachable in the network. The "destination" itself is derived from the Router class and has an "address" field that will be used as an index into the routing table and the cost table. Unless explicitly mentioned otherwise, we will use the term "minimum-cost path" and "shortest path" interchangeably,

Useful classes and data structures

The Router class:

self.routes, the routing table: A dictionary that maps from a destination address to a link. This link is the link that the Router will use to forward any packet destined for the corresponding destination address.
self.spcost, the table of costs of the shortest (i.e., minimum cost) paths: A dictionary that maps from a destination address to the current estimate of the cost of the path to get there.

The Link class: This class is defined in lab9_net.py; you don't need to modify this class, and the main fact you should know for this lab is that you can obtain the cost of a link, l, using l.cost (a variable in class Link). The other place you'll use the Link class is to populate the routing table, which (by definition) stores the Link to be used to reach any destination. We have provided a useful function in the Router class, getlink(n), which takes a neighboring Router, n, and returns the Link connecting self to n. This function is useful in constructing the routing table.

The HELLO protocol and maintaining live neighbors

The HELLO protocol has already been implemented for you in lab9_router.py (you should not need to modify this file); each Router sends a HELLO packet every HELLO_INTERVAL time slots (10 by default in NetSim). Whenever a node hears a HELLO along a link, it adds the current time, the address of the Router at the other end of the link, and the cost of that link, to self.neighbors, which is a dictionary mapping a Link to a (timestamp, address, linkcost) tuple.

If a node does not hear a HELLO for more than 2*HELLO_INTERVAL time slots on a link, it removes that node from self.neighbors and calls the Router's link_failed(link) function, giving the "failed" link as argument. In response, the routing protocol may take suitable actions.

Debugging and Testing Procedures

Writing distributed protocols, even in simulation, can be a challenge. To help you a bit, we have some utilities that are accessible via the GUI. Clicking on any node while the simulation is running enables you to look at its routes and shortest path costs to every destination. For link-state routing, clicking on a node also prints out the last LSA information available at that node from each of the other nodes.

For simple topologies, eye-balling these routes and the printed state at several nodes in the network should convince you of the health of your routing protocol implementation.

Please note that clicking on any link toggles the state of the link between "working" and "failed" states, thereby letting you test your protocol under link failures. We will want to ensure that your protocol works properly in the face of failures and recoveries.

The "Step 1", "Step 10", and "Step 100" buttons are the way in which you should step through the operation of your protocol and see what is happening by clicking on various nodes. (The GUI also provides a "play" and "step all" buttons, which aren't particularly useful for debugging.)

In any given time-slot, colored squares may appear on a link. These are packets. The packets are color-coded: green ones are HELLO packets, red are advertisements (type ADVERT), and blue are data packets. (The data packets aren't relevant to this lab.)

For both the routing protocols you will implement in this lab, you should demonstrate the correctness of the routing tables at the various nodes in the following scenarios (on any random topology) during the checkoff interview.

No link failures. You must verify that all routes computed at the various nodes actually correspond to the shortest paths in the graph. In addition, every node must have a routing table entry for all other nodes in the topology.
One or more link failures, but a connected network. The routing protocol should be able to adjust its routes to route packets "around" the failed link. You should be able to give a rough estimate of how long link state and distance vector protocols take to find the new routes. You should also try bringing the failed link back up and verify that the routes go back to what they looked like in the base case.
Disconnected topology. Break enough links in the topology such that some subset of the nodes cannot reach the other subset. You must show that your routing protocol eventually converges to the correct routing in this case -- that is, a node must have a routing table entry to every node it is connected to by a path in the underlying topology, and must not have a routing table entry for any node it cannot reach. You should also be able to provide a rough estimate of how long this convergence process lasts for the two routing protocols. In the case of distance vector routing, you should also be able to demonstrate the "count to infinity" problem during convergence. Finally, when you "heal" one or more links to produce a connected toplogy, the protocol should eventually ensure that all nodes find the correct routes to all destinations.

Each task below is worth 5 points; in each case, 2 points each for the first two scenarios mentioned above (no failures and one or more failures), and 1 point for the third case (disconnection).

Task #1: Distance vector (DV) routing (5 points)

The file you will have to extend is lab9_dv.py. This file contains the DVRouter class, which is derived from the Router class (which in turn derives from the Node class defined in lab9_net.py. Your first task for this lab is to write the following three functions, which are the core of any DV protocol:

make_dv_advertisement(): Scan the self.routes and self.spcost tables and construct a list of [(dest1, cost1), (dest2, cost2) ...]. Return this list.
As explained in lecture and the lecture notes, each router in a DV protocol periodically exchanges routing advertisements with its neighbors in the network topology, containing the information about destinations and their shortest-path costs ([(dest1,cost1), (dest2,cost2), ...]). This function will be called every ADVERT_INTERVAL time slots (50 by default in NetSim) by send_advertisement(), which will take care of constructing packets with this list as its contents, and sending one such packet to each neighbor in the topology.
link_failed(link): Called when the HELLO protocol determines a failure. When called, your code needs to take suitable action to recognize that the link is now "dead". For example, depending on how you design your DV protocol, you may: set self.spcost for all destinations whose routing table entries currently use that link to infinity, delete that route from your table, or anything else.
We are intentionally not specifying the precise behavior, leaving it to you to design it. As long as what you do in this step is consistent with what you do in make_dv_advertisement(), your protocol will work correctly. Conversely, an inconsistency will likely cause the protocol to be incorrect.
integrate(fromnode, adv): This function is where the actual distributed computation occurs. It takes as input two arguments: the node from which the advertisement came (fromnode), and the advertisement itself (adv), which you constructed as a list in make_dv_advertisement. (You can ignore the marshalling of the advertisement into a packet and the corresponding unmarshalling back to the list, but if you're curious, you can see how send_advertisement() and process_advertisement() do these tasks.)
The result of integrate() is the current self.routes and self.spcost tables, which as mentioned before, are the routing table and table of shortest path costs. The underlying rule you should use is the Bellman-Ford update rule, as described in lecture: remember to add the link cost to the cost reported by the advertisement that comes from the neighbor at the other end of the link.
The integrate() step should take care to update the current route for a destination under the following conditions:
1. If the cost in an advertisement for the destination is smaller than the cost of the current route.
2. If the cost in an advertisement for the destination changes, and the advertisement comes on the link corresponding to the current-best route.
3. Depending on how you design your DV protocol, you may have to take care of one subtle (but important) issue in integrate(): if you find that a previous advertisement that came from fromnode contained a destination, and you are using the corresponding link as the route to the destination, and the current advertisement does not mention the destination, then you have to assume that the destination is no longer reachable via fromnode. Otherwise, it is likely that your protocol may not be correct. You should also note that not every design requires this check; it all boils down to how you send your routing advertisements.

You are not required to implement mechanisms to alleviate or eliminate "counting-to-infinity", but are welcome to do so if you like.

Task #2: Link-state (LS) routing (5 points)

The LS protocol uses the LSRouter class, which is derived from the Router class. The self.routes and self.spcost tables are identical to the DV case. The LSRouter class adds two new variables to Router:

self.LSA, a dictionary that maps a Router address to a list [seqnum, (n1,c1), (n2,c2) (n3,c3), ...], where the Router address is the originator of the link-state advertisement (LSA), seqnum is the current sequence number of the LSA from that node, and the (n_i,c_i) tuples are the currently live neighbor address and link cost from the LSRouter sending the LSA. Note that this LSA does not have a "origin_address" field; we simply get that from the source field in the packet, and you don't need to worry about it (in practice, implementations will explicitly include such a field in the LSA).
self.LSA.get(u) returns the last LSA update originating from Router u that this Router (self) knows about. It has the format [seqnum, (n1,c1), (n2,c2), ...] where seqnum is the sequence number at u when it originated the LSA and each n_i is a neighbor of u (that u's HELLO protocol considered to be "live" when it generated the LSA numbered seqnum), and c_i is the cost of the link from u to n_i.
self.LSA_seqnum, which is the sequence number for the LSA generated by the Router. It increments by 1 on each successive advertisement.

The details of how an LS protocol works were described in the lecture and are in the lecture notes. Each Router periodically sends its currenly live links (which we maintain in the self.neighbors dictionary, as explained earlier when we discussed the HELLO protocol). Each Router also re-broadcasts, along all its links, an LSA packet that it receives via a neighboring Router, containing an LSA originating at some other Router. This re-broadcast is done only once per LSA; to ensure this property, the Router checks the sequence number of the incoming LSA to make sure that it is larger than the last LSA heard from that originating Router. These periodic LSA broadcasts are done every ADVERT_INTERVAL time slots (50 by default in NetSim); the re-broadcasts are done when a Router receives a new LSA (using the sequence number check).

For this task, the file you will have to extend is lab9_ls.py -- your task is to write the following two functions, which are the core of any LS protocol:

make_ls_advertisement(): consult the list of neighbors, self.neighbors, that are currently "live" and return a list of [(neighbor1,linkcost1), (neighbor2,linkcost2), (neighbor3,linkcost3), ...]. This function is called by send_lsa, which marshalls this LSA into the packets, and then sends one packet each along each link.
Note that the self.neighbors dictionary mapping a Link to a (timestamp, address, linkcost) tuple will be useful in constructing the LSA; as mentioned above in the discussion of the HELLO protocol, neighbors keeps track of currently "live" neighboring Routers.
run_dijkstra(nodes): Use Dijkstra's algorithm to produce self.routes[dest] for all dest in nodes, as well as self.spcost[dest]. The topology information for the network (graph) is available in the self.LSA, whose format was described above. The set of nodes currently reachable from the Router is passed as the argument to run_dijkstra().
There is one important issue that you need to watch out for in the steps of run_dijkstra() that will set the routing table entries, self.routes, for various destinations. At the Router, as you go through the different destinations in non-decreasing order of shortest-path costs and set the route to a node to be that of its parent in the shortest path tree. If the parent is self.address (i.e., the Router running the algorithm), then you should remember to set the route to the link connecting the Router (self) to the destination. You can use the self.getlink() method for this purpose.