6.02 - Improving the Throughput of Reliable Transport Lecture #23 Katrina LaCurts, lacurts@mit.edu ====================================================================== Recap: The problem with stop-and-wait ====================================================================== Recall the stop-and-wait protocol from the previous lecture: At the sender: - Send one packet, and keep track of its sequence number and transmission time - Once an ACK is received for that packet, delete the stored sequence number, and send a new packet (using the same strategy of saving its sequence number and waiting for an ACK) - If an ACK hasn't been received after timeout seconds since the packet's transmission time, retransmit it to the receiver. At the receiver: - Upon receipt of packet k, send an ACK for packet k - If k is greater than the last sequence number we received (or if we haven't received any packets yet), then deliver the packet to the application and keep track of k This protocol uses sequence numbers and ACKs to provide basic reliability and re-ordering capabilities. We added the timeout at the sender to avoid deadlock, and the receiver keeps track of packets that it previously delivered to the application in order to avoid delivering duplicates. Although this protocol is a correct reliable transport protocol, we found a problem: its utilization can be quite low. We saw that in the best case, the throughput was equal to (1-L)/RTT, where L is the probability that a packet or its ACK is loss. Even with zero loss, this can lead to incredibly low utilization (e.g., using stop-and-wait on a link with an RTT of 100ms that is capable of 100 packets per second will result in 10% utilization). ====================================================================== The solution: Sliding-window ====================================================================== The problem in the previous example is that we had a link that could handle 10 packets every RTT (100 packets per second = 10 packets per 100ms, on average) and yet stop-and-wait only allows one packet to be on a link at a time. We frequently say that stop-and-wait allows only one "outstanding", or un-ACKed, packet at a time. Our solution -- the sliding-window protocol -- will be to allow more than one outstanding packet at a time. ---------------------------------------------------------------------- The basic idea ---------------------------------------------------------------------- The basic ideas are these: - An end point will allow W outstanding packets (i.e., W un-ACKed packets) at once. W is known as the window size. In 6.02, for today, W is fixed and known by the sender (we will discuss later how to set it). - The collection of packets that are outstanding are known as the window. Every time the sender receives an ACK, the sender "slides" its window by 1. (Students: You should check the slides for diagrams of this process. It was too hard to draw in ASCII.) This idea of sending a new packet every time we receive an ACK is actually really clever. When the sender receives an ACK for packet k, it doesn't just mean that packet k has been received; it *also* means that a packet (k) has left the network. This means that there is "room" for a new packet in the network. Aside: This same technique is used in more complicated reliable transport protocols (e.g., TCP); it's known as "ACK-clocking". ACK-clocking lets end points estimate how much data they can put into the network on their own, without coordinating with any other nodes or a centralized server. It's really quite cool from a distributed-systems perspective. We can now put stop-and-wait in context with sliding-window: stop-and-wait *is* a sliding-window protocol, just with a window size of W = 1. Setting the correct window size -- the one that maximizes network utilization -- is a challenge we will overcome later in this lecture. Now you have the basic idea of sliding-window, so I'll describe a few details to you: - What happens when a packet is lost - What happens when packets are received out of order. ---------------------------------------------------------------------- At the sender: Handling retransmissions ---------------------------------------------------------------------- Sliding-window deals with lost packets just like stop-and-wait does: it waits timeout seconds and then retransmits the packet. SLIDE: Show this Consider a window size of 5. The sender firsts sends packets 1 through 5, and then starts to receive ACKs. - Start: Window = [1, 2, 3, 4, 5] - Receive ACK for 1 => send a new packet Window = [2, 3, 4, 5, 6] - Receive ACK for 2 => send a new packet Window = [3, 4, 5, 6, 7] - Receive ACK for 3 => send a new packet Window = [4, 5, 6, 7, 8] - Receive ACK for 4 => send a new packet Window = [5, 6, 7, 8, 9] - Receive ACK for 5 => send a new packet Window = [6, 7, 8, 9, 10] - Receive ACK for 6 => send a new packet Window = [7, 8, 9, 10, 11] - Receive ACK for *8* => send a new packet (assume 7 was lost) Window = [7, 9, 10, 11, 12] - Receive ACK for 9 => send a new packet Window = [7, 10, 11, 12, 13] Eventually, a timeout will fire for packet 7, and the sender will retransmit it: - Timeout fires for packet 7 => retransmit Window = [7, 10, 11, 12, 13] This example illuminates a few important issues: 1. The window of outstanding packets does not necessarily contain successive sequence numbers. 2. If the sender retransmits a packet, and its window is already full (i.e., it contains W outstanding packets), it does *not* also send a new packet. For instance, in the last step in my example, we did not do this: - Timeout fires for packet 7 => retransmit and send a new packet Window = [7, 10, 11, 12, 13, 14] Why? Simple; the sender is only allows to have W (=5 in this example) outstanding packets at a time. 3. To handle retransmissions, the sender needs to keep track of the un-ACKed packets and the times they were sent. At the sender, a sliding-window protocol looks something like this: - Transmit a packet if len(un-ACKed list) < W - Upon transmission of packet k, keep track of k in the un-ACKed list, and the time that k was sent - When an ACK for packet k is received, remove k from the un-ACKed list. - Periodically check the un-ACKed packets list to see if any packets were sent more than timeout seconds ago. If so, re-transmit. In practice, the sender's "send()" function is called once every timeslot, with a very fine-grained timer. So all of these checks (whether we should send a new packet, whether we should re-transmit a packet) are happening frequently. ---------------------------------------------------------------------- At the receiver: Handling out-of-order packets ---------------------------------------------------------------------- Now, what's going on at the receiver? As before, the receiver ACKs every packet it receives, without exception. The challenging part is deciding what packets to deliver to the application. The receiver will keep each packet it receives in a buffer, ignoring duplicates. It will also keep track of the next packet the application expects. When it delivers a packet to the application, it removes it from this buffer, but the receiver *cannot* deliver a packet if the application isn't expecting it. Let's see how this plays out in our example: - Start Sender window = [1, 2, 3, 4, 5] Application expects: 1 Received buffer = [] - Receive ACK for 1 => send a new packet Sender window = [2, 3, 4, 5, 6] Application expects: 1 Received buffer = [1] => okay to deliver 1 - Receive ACK for 2 => send a new packet Sender window = [3, 4, 5, 6, 7] Application expects: 2 Received buffer = [2] => okay to deliver 2 - Receive ACK for 3 => send a new packet Sender window = [4, 5, 6, 7, 8] Application expects: 3 Received buffer [3] => okay to deliver 3 - Receive ACK for 4 => send a new packet Sender window = [5, 6, 7, 8, 9] Application expects: 4 Received buffer = [4] => okay to deliver 4 - Receive ACK for 5 => send a new packet Sender window = [6, 7, 8, 9, 10] Application expects 5: Received buffer = [5] => okay to deliver 5 - Receive ACK for 6 => send a new packet Sender window = [7, 8, 9, 10, 11] Application expects 6 Received buffer = [6] => okay to deliver 6 - Receive ACK for *8* => send a new packet (assume 7 was lost) Sender window = [7, 9, 10, 11, 12] Application expects 7 Received buffer = [8] => NOT okay to deliver 8 - Receive ACK for 9 => send a new packet Sender window = [7, 10, 11, 12, 13] Application expects 7 Received buffer = [8, 9] => NOT okay to deliver 8 or 9 The receiver's buffer will keep growing until it receives the re-transmission of packet 7, after which it will be okay to deliver 7. In fact, the receiver will be able to deliver 7, 8, and 9 all at once. From this example, you can see that it is *not* okay for the receiver to simply discard packets that arrive out of order. It needs to buffer them and deliver them in order. So at the receiver, our protocol looks like this: - Send an ACK for every received packet - Save delivered packets -- ignoring duplicates -- in a local buffer - Keep track of the next packet the application expects. After each reception, deliver as many in-order packets as possible. SLIDE: Graph of packet re-transmissions and ACKs ====================================================================== Analysis of Sliding-window ====================================================================== The throughput of sliding-window depends, not surprisingly, on the size of the window. Our goal is to set a window size that maximizes throughput. ---------------------------------------------------------------------- Bandwidth-delay product ---------------------------------------------------------------------- Let's start off with W = 1, and we'll assume for now that there is no packet loss. In the last lecture, we saw that the throughput will be 1/RTT. SLIDE: Start graphing this If we increase W to 2, the protocol will deliver two packets every RTT seconds. So a throughput of 2/RTT. If W is 3, then 3/RTT, etc. It seems like no matter what W is, the throughput will be W/RTT. So we can just keep increasing W and increase our throughput? No, of course not. Along the path from a sender S to a receiver R, there is some bottleneck link: the slowest link on the path. Let's call its rate B (in packets/second). We will not be able to exceed a throughput of B, no matter what value we use for W. Let's first figure out what value we need for W to achieve a throughput of B packets per second. If the throughput is W/RTT, and we want that equal to B, then we should use W = B * RTT. This result can also be seen with an application of Little's Law. Consider the path between S and R, and imagine that the path is just one big queue. If W is the window size, and RTT is the mean delay between the transmission of a data packet and the reception of its ACK, packets are being drained from the queue at rate B (the bottleneck link rate), so the mean number of packets in the queue is B * RTT. Since the queue here is our path, that means that the mean number of packets in the path is B * RTT. W is equivalent to the mean number of packets in the path, hence W = B * RTT. The formula W = B * RTT has the same problem we discussed above: B = W/RTT, which implies that as we increase W, B will also increase, which we know is not true. So what's wrong? Let's break down the RTT. As we know, the delay experienced by a single packet comes from multiple things: - Transmission, propagation, and processing delays. These are all (roughly) constant, per-packet, and they are never zero. - Queueing delay, which is variable, and could in fact be zero (if there are no packets in any of the queues) Let's call RTT_min the delay in the absence of queueing. It's the absolute fastest a packet could travel through our network. If the RTT is equal to RTT_min, then our equation is just W = B * RTT_min. Well, how does that help us? We can still write B = W/RTT_min, which still gives the impression that as we increase W, B increases. But notice! If W is larger than B * RTT_min, then queues in the network are going to start to build. The RTT won't be a quantity independent of W; in fact, it will be very much dependent on W, and will no longer be equal to RTT_min. So increasing W will actually increase the RTT; B will not increase. Intuitively, if W > B * RTT_min, then the throughput remains constant at B. Mathematically: Let Q be the average number of packets in the queue of the bottleneck link. By Little's Law, Q = B * tau, where tau is the average delay in the queue. We also know: RTT = RTT_min + tau = RTT_min + Q/B. Since W is the number of un-ACKed packets, these packets are either in the queue or "on the wire" (i.e., on the path between the sender and receiver but not in the queue). So: W = Q + B*RTT_min. Now let's use Little's Law again. Throughput = W/RTT = (Q + B*RTT_min) / (RTT_min + Q/B) = B. SLIDE: Graph of throughput vs. window size The quantity B * RTT_min is known as the bandwidth-delay product. ---------------------------------------------------------------------- Calculating throughput ---------------------------------------------------------------------- Let's actually calculate throughput. Remember that the protocol delivers W packets every RTT seconds, so the throughput is W / RTT (the issue is that, with larger W, RTT will also grow). Without packet losses: - If W <= B*RTT_min, then Throughput = W / RTT_min - If W > B*RTT_min, then Throughput = B - Moreover, when W > B*RTT_min, then W = B * RTT_min + Q, where Q is the queue occupancy. And what about packet losses? Let L be the probability that a packet or its ACK is lost. We know that our utilization is at most 1 - L. The expected number of transmissions, T, for successful delivery of a packet and its ACK is T = (1-L)*1 + L*(1 + T), so T = 1/(1-L), and utilization = 1 / (1/(1-L)) = 1-L. In this case, throughput = B * (1-L). To achieve this, we pick W > B*RTT_min to ensure that the bottleneck link is busy even when there are packet losses. If W >> B * RTT_min, then the delays will be too large, the timeout will be too big, and other connections may suffer. ====================================================================== A Summary, and a Lingering Question ====================================================================== This is a summary of the reliable transport protocol we've ended up with: - Use sequence numbers and ACKs to give basic packet-ordering capabilities and reliability. - The sender keeps W outstanding packets at a time, and re-transmits packets if it has not received an ACK after timeout seconds. - The receiver keeps a buffer of received packets, ignores duplicates, and delivers them in-order to the receiving application. - Setting W = B * RTT_min (or slightly larger in the case of packet loss) allows us to achieve the maximum throughput of B (or (1-L)B in the case of packet loss). Before I continue, remember weeks ago when we were comparing circuit-switching and packet-switching? Circuit-switching seemed like a bad idea: it only worked for a particular workload. And indeed, packet-switching is much more flexible. But you can see now how complicated things got! We have also added a lot of overhead to the network. Now, here's a question for you: In all of our models today, we dealt with an abstraction where there was just a single connection. What happens when there are multiple connections using the network? Clearly they can't all achieve a throughput of B. Let's imagine a network with two connections. If we want them to share the network equally, they should each get a throughput of B/2. This means their window sizes should be B/2 * RTT_min. With three connections, B/3 * RTT_min, etc. How can we get this to happen? The senders don't know how many other connections exist in the network, and moreover, it's constantly changing. Second, sometimes a sender might not have a lot of data to send. For instance, in my case above, imagine that one connection only has enough data to send B/4 packets per second. Then the other connection should certainly be allowed to send 3B/4 packets per second, if it can. How will the end points figure this out? (We'll answer this on Wednesday)