6.02 Lab #10: Reliable Packet Delivery

Due date: Fri, 5/8, at 11:59p

Goal

Using NetSim, a simple network simulator, develop and experiment with reliable data transport protocols over a multi-hop packet-switched network.

Useful links

Instructions

See Lab #1 for a longer narrative about lab mechanics. The one-sentence version:

Complete the tasks below, submit your task files online before the deadline. Since there isn't sufficient time for our normal checkoff interview process, please enter your answers to the interview questions into a separate file and submit the file using the same process as used to submit your task files. We'll review your submitted code and your answers and enter the appropriate score on-line.

As always, help is available in 32-083 (see the Lab Hours web page).

For this lab, you will need wxPython (2.8.x), which has already been installed on the athena machines. To install it on your computer, visit http://www.wxpython.org/


Introduction and preliminaries

NetSim is a network simulator that simulates a set of switches connected by links. NetSim executes a set of steps every time slot; time increments by 1 each slot. During each time slot a link can deliver one packet from one end of the link to the other end of the link. The nodes in the simulator will be derived from the LSRouter class that you implemented in Lab 9, and therefore will be capable of routing packets between themselves. In this lab, we will develop the core logic of the ReliableSenderNode and ReliableReceiverNode classes, both derived from the LSRouter class, to implement a reliable data transport protocol between a sender and receiver.

Every simulation will consist of a sender and receiver connected by a network of LS routers. The sender will send packets to the receiver using a reliable data transport protocol. You will write the code for the main components of two reliable data transport protocols: a stop-and-wait protocol and a sliding window protocol (with fixed window size).

You can run the python programs for this lab using ipython or python on your computer or on athena (note: on athena, use ipython602 and python602). The lab will not work in IDLE.

To understand the different parameters one can set in NetSim, go to a shell and enter:

# python lab10_stopwait.py -h
(Or, in ipython, run lab10_stopwait.py -h).

# python lab10_slidingwindow.py -h
(Or, in ipython, run lab10_slidingwindow.py -h).

The "-l" option causes data and ACK packets to be dropped on any link with the specified probability. The "-r" option generates a random topology with the specified number of nodes (the default number of nodes is 12 and the default simulation time is 2000 time slots). For lab10_slidingwindow.py, the "-w" option sets the window size to be used in the protocol. The maximum number of unacknowledged packets sent by the sender must not exceed this window size setting.

This lab has two main tasks that implement two different reliable transport protocols. Each task has a few sub-tasks. You will test these implementations on a few different topologies that will be generated when you run the corresponding task files.


Debugging and Testing Procedures

During any point in the simulation, you can click on the sender, node S, to see an estimate of the round trip time (RTT) to the receiver, node R (and back), and the current timeout being used. Clicking on R will show the current throughput at the receiver measured at the number of useful (i.e., in-sequence and non-spurious) packets passed up to the application per simulation timestep, as well as the total number of spurious packets received. We will judge the quality of your transport protocol by the throughput and RTT measurements displayed at the end of the simulation.

If your protocol works correctly, two things should happen:

  1. The receiver and sender should not "hang" -- i.e., the protocol should not deadlock with both sides waiting for something to happen.
  2. The receiver should not print an error message and terminate the program. That will happen if your protocol delivers an out-of-sequence packet to the receiving application (in the app_receive call, as explained below).
If the protocol is correct then the only potential problems that might remain are performance problems; the protocol may be unacceptably slow or unexpectedly sluggish. Obviously, we want your protocol to come as close as possible to the theoretically expected performance. The performance metric of interest is the receiver throughput, measured in packets per time slot.

To help debug your code, you can print debug statements whenever a significant event (e.g., data/ack reception) happens at any node. The template code already prints out a few statements; feel free to add more to help you understand what's going on. One can also see the total number of pending packets at various nodes on the bottom panel of the simulator; if this number is persistently in the hundreds, you know something is wrong or that you are transmitting packets faster than the network can clear.

You can use the "-l" option to test the correctness of your protocol at different link loss rates. The loss rate is a parameter between 0 (no loss) and 1 (everything on a link is lost). Note that both packets and ACKs will get lost. In these simulations, routing messages don't get lost.


Task #1: Stop-and-wait protocol (5 points)

The file you will have to extend is lab10_stopwait.py.

The stop-and-wait protocol works as follows:

In this task, you will implement the following functions in ReliableSenderNode:
reliable_send(self,time)
This function, invoked every time slot at the sender, decides if the sender should (1) do nothing, (2) retransmit the previous data packet due to a timeout, or (3) send a new data packet -- see the detailed description above. timeis the current network time measure in time slots. The send_pkt can be used to build and send a data packet -- see the definition of this function in lab10_stopwait.py for details about the calling sequence. send_pkt also return the packet it transmitted so it can be saved pending acknowledgement.

process_ack(self, time, acknum, timestamp)
This function is invoked whenever an ACK packet is received. time is the network time when the ACK was received, acknum and timestamp are the sender's sequence number timestamp that were echoed in the ACK. This function must call the calc_timeout function described below, among other things.

calc_timeout(self, time, timestamp)
This function should be called by your process_ack method to compute the most recent data packet's round trip time and then recompute the value of self.timeout. Add a print statement that reports the mean (self.srtt) and deviation (self.rttdev) you've calculated for the round-trip time.

In ReliableReceiverNode please implement:

reliable_recv(self, sender, time, seqnum, timestamp)
This function at the receiver is invoked upon receiving a data packet from the sender -- see the detailed description above. You can use the send_ack method to build and transmit the ACK packet.

If you introduce additional instance variables in the sender or receiver, you should add the appropriate initialization code to the reset method for the class.

Interview Questions

Please enter your answers to the following questions into a separate file (along with your answers to the questions for Task #2) and submit the file on-line by the due date.


Task #2: A sliding window protocol (5 points)

The file you will have to extend is lab10_slidingwindow.py.

The sliding window protocol extends the stop-and-wait protocol by allowing the sender to have multiple packets outstanding (i.e., unacknowledged) at any given time. The maximum number of unacknowledged packets at the sender cannot exceed its "window size", and is specified with the "-w" option. The window size is available as self.window.

Upon receiving a packet, the receiver sends an ACK for the packet's sequence number as before. The receiver then buffers the received packets and delivers them in sequence number order to the application. See section 22.3.2 of the lecture notes.

In this task, you will implement the same functions that you did for the previous task, with the necessary modifications to have multiple unacknowledged packets outstanding. You can copy over the calc_timeout function from the previous task; if implemented correctly, it won't have to change. When you run your code with a window size of 1 ("-w 1" option), you must get the same value throughput as you did in the previous task; this will serve as a necessary (but not sufficient) sanity check for the correctness of your sliding window protocol.

Then run the protocol for various values of the window size ranging from 1 to 10. Determine how the throughput depends on window size and think about out why.

Interview Questions

Please enter your answers to the following questions into a separate file (along with your answers to the questions for Task #1) and submit the file on-line by the due date.