6.02 - Congestion Control Lecture #24 Katrina LaCurts, lacurts@mit.edu ====================================================================== The Theme of this Lecture ====================================================================== Today's lecture is about a fairly advanced networking topic known as congestion control. My goal in this lecture is *not* to turn you into congestion-control experts. I'd like you to learn something about congestion control, but more importantly, I'd like you to start thinking about networking from a distributed-systems perspective. This means that I'd like you to start considering how the protocols we've discussed so far work (or don't work) when there are millions of users, each with different demands and requirements, who are joining and leaving the network constantly. To get at this broader perspective, we will focus on one single question: How can a single reliable sender, using a sliding-window protocol, set its window size to maximize utilization, given that there could be many other (unknown) connections using the network? ====================================================================== Introduction ====================================================================== Let's back up. Last time, we discussed the sliding-window protocol. The throughput that this protocol achieves depends on the window size, W. Through some analysis, we found that we want W to be equal to (or slightly larger than) the bandwidth-delay product: W = B * RTT_min. And in the previous lecture, we assumed that W was fixed and known at the sender. ---------------------------------------------------------------------- A fixed, unknown window-size ---------------------------------------------------------------------- Let's first imagine that the sender *doesn't* know the bandwidth-delay product of the path it's using. Its goal is to determine what the bandwidth-delay product is and then set its size accordingly. For this section of the notes, I'm going to refer to the bandwidth-delay product as being the "correct" window size; our sender's goal is to determine the correct window size for this path. One thing the sender might do is simply try different window sizes. Different choices for W will result in different throughputs. Is there a way to determine where we are on the graph? Yes. I'll break it down into two cases: 1. W < B*RTT_min. When W < B*RTT_min, the throughput will be less than B packets per second. So if W < B*RTT_min, and we increase W slightly (for now, let's say by 1), we should see an increase in throughput. 2. W > B*RTT_min. When W > B*RTT_min, the throughput will be B packets per second. Increasing W by 1 will still result in B packets per second, so throughput won't change. And in fact, decreasing W by 1 will (likely) not change the throughput either. Though if we decrease W enough, we should fall back on the "left side" of the graph. So an algorithm for setting W might be: - Set W to an initial value of 1. Calculate throughput with this W value (call it tput_1) - Increase W slightly. Calculate throughput with this W (call it tput_2). - Keep increasing W slightly until tput_1 = tput_2 (which means we've hit the correct window size). This algorithm has some issues, which I'll come to later, but let's stick with it for now. ---------------------------------------------------------------------- Multiple Senders ---------------------------------------------------------------------- Now imagine multiple senders using the same network path (or, really, the same bottleneck link). Let's assume they've all got their window set correctly. For instance: S1 -- 2 Mb/s -- A ---- 2 Mb/s ---- B -- 2 Mb/s -- D1 | | S2 -- 2 Mb/s --- -- 2 Mb/s -- D2 What will happen? The queue at A will build up, which will lead to large delays in the network and eventually packet drops. This scenario is what we mean when we say congestion. Note that both S1 and S2 were sending at the "correct" rate; the bottleneck link (A-B) could handle 2Mb/s, and they were sending at 2Mb/s). The problem here is obvious: combined, S1 and S2 were sending at 4Mb/s; twice the bottleneck rate. What *should* have happened? Well, that's actually debatable (you'll see in a second), but one reasonable alternative is to have S1 and S2 send at half the bottleneck link rate: S1 -- 1 Mb/s -- A ---- 2 Mb/s ---- B -- 1 Mb/s -- D1 | | S2 -- 1 Mb/s --- -- 1 Mb/s -- D2 But how would S1 and S2 know to do this? Remember that S1 only knows about itself; it doesn't know that there is a second sender in the network. Moreover, a third sender could come in, or S2 could disappear, or S2 could start sending less than 1Mb/s, etc. How is S1 going to be able to achieve the "correct" rate, assuming we can even decide on what rate is correct? ====================================================================== Congestion Control ====================================================================== The mechanism that we use to solve this problem is known as congestion control. Congestion control means controlling the source rate to achieve high performance (among other things, this means we'd like to avoid packet drops). Fact: The Internet was first built without any congestion control in place. It eventually experienced "congestion collapse": too many packets were sent by endpoints, and no usable data got through (all of the packets were retransmits of lost packets). So in a very real sense, congestion control is (one of) the reason(s) the Internet works at all. ---------------------------------------------------------------------- Goals ---------------------------------------------------------------------- There are two objectives of any congestion control mechanism: efficiency and fairness. 1. Use the network efficiently - Minimize drops, minimize delay - Maximize bottleneck utilization 2. Share the network bandwidth fairly among all connections that are using it. Before we get an actual congestion control protocol, let me describe to you what we mean by sharing bandwidth fairly. There are many notions of fairness in networking. The one we use for congestion control is very intuitive; it's known as "max-min" fairness. It's best described with examples. And it is *NOT* the same as the fairness we studied when we studied MAC protocols. Example 1: Suppose my bottleneck link rate is 2Mb/s, and both S1 and S2 have a lot of data to send. We'll say they have "infinite demand". Then the bottleneck link should be split equally: S1 should get 1Mb/s, S2 should get 1Mb/s. If all sources have infinite demand, then max-min fairness just boils down to splitting the bottleneck link equally. The interesting parts come when some sources don't have infinite demand: Example 2: Suppose the same bottleneck link rate (2Mb/s), but now suppose S1 only has .5Mb/s of data to send (S2 still has infinite demand). Then S1 should get .5Mb/s, S2 should get 1.5Mb/s). Example 3: Now imagine a third source enters the network, also with infinite demand (S1 still has .5Mb/s to send, S2 still has infinite demand). Then S1 should get .5Mb/s, and S2 and S3 should get .75Mb/s. Basically: If there are N connections, any connection that requires less than 1/N^th of the bandwidth will get whatever bandwidth it needs. Then we will divide the remaining bandwidth up among the remaining M connections in the same manner. If all connections have demand greater than 1/M, we divide the bandwidth equally. Aside: The reason this is called "max-min" fairness is because we're maximizing the minimum amount of resources allocated to any connection. We will not prove this in 6.02. Max-min fairness is nice because it's intuitive. But just for context, other definitions of fairness might bring cost/price paid for a service into the calculation, or they might optimize for equalizing resources over the sum of all the links, not just the bottleneck links. Again, we won't worry about those alternative definitions of fairness; you should just know that other definitions exist. ---------------------------------------------------------------------- Types of congestion control ---------------------------------------------------------------------- The particular type of congestion control we're going to study today is an end-to-end congestion control protocol, sometimes known as a window-based congestion control protocol. The basic idea behind these protocols is that the switches in the middle of the network are dumb; all they can do is drop packets. The sources, however, are smart; they can adjust their rate to converge to the optimal rate. An alternative approach is to allow the switch in the middle to help compute the optimal rate. We will briefly compare these approaches at the end of class. ====================================================================== Window-based Congestion Control ====================================================================== Let's go back to the algorithm we developed at the beginning of class. The basic idea of this algorithm was to increase W until throughput leveled off. This idea of increasing W is going to stick around. But our algorithm has a few problems: 1. It's slow. We increased W slowly. 2. Once we found the correct value for W, we never re-adjusted. 3. It requires measuring throughput, which takes time (say, seconds) and our measurements might not be perfectly accurate. Moreover, if there are multiple connections using the network, we don't know what the "correct" throughput should be, and also the correct throughput is constantly changing. Our biggest problem right now is the third one. We need a better signal to tell us what to do with our window. ---------------------------------------------------------------------- Remember MAC protocols? ---------------------------------------------------------------------- Remember Stabilized ALOHA? In Stabilized ALOHA, nodes increase or decrease their sending probability depending on whether they experience a collision. In effect, a collision signals that there is congestion in the network. If a node experiences a collision, it interprets that as there being congestion, and decreasing its sending probability (which, in turn, will decrease the overall congestion). We'd like a similar signal for a reliable-transport protocol. What reliably signals to an end point that there is congestion in the network? Packet drops! ---------------------------------------------------------------------- AIMD ---------------------------------------------------------------------- The protocol that I'm going to describe here is the congestion control protocol used by TCP. TCP stands for Transmission Control Protocol; it is the reliable transport protocol that the Internet uses. It is a sliding-window protocol that behaves just like the one we have been studying, except with a few additional optimizations. The basic idea TCP congestion control: Keep increasing your window until you see packet drops, then backoff and try again. Aside: The paper that described this congestion control mechanism originally discusses packet drops saying "this [congestion] signal is delivered automatically by all existing networks, without special modification". This was in contrast to some other proposals which required modifications to packets and/or the switches in the network. The specific way that it increases its window is: Every RTT: - If there is no loss, W = W+1 - If there is loss, W = W/2 This technique is known, for obvious reasons, as "additive increase multiplicative decrease", or AIMD. Essentially, senders are conservative about increasing their windows, but scale back dramatically when they experience congestion. Aside: There are also more technical reasons for doing additive increase multiplicative decrease rather than, say, multiplicative increase multiplicative decrease. Essentially, the network remains most stable when AIMD is used. This protocol is in fact both efficient and fair. (See diagram I drew on board in lecture) ====================================================================== Slow Start ====================================================================== We already solved two of the problems with our original algorithm. TCP's congestion control doesn't use throughput as a congestion signal, instead it uses packet drops. It is also constantly adjusting the window, and so will adapt to changes in the network. But what about its slowness? After all, we're still increasing the window size additively. First of all, it's not quite as slow as it might seem. Before, we were starting with a window size of 1, and increasing additively. Now, except for the very beginning, we're starting from a window size of V/2, where V is the correct size. And, as I mentioned above in the aside, if we increase more quickly than additively, we risk an unstable network. However, additively increasing at the beginning of the connection is still a concern from a performance standpoint. To handle that, TCP does something different at the beginning of a connection: it exponentially increases its window, doubling it every round-trip-time until it sees loss. This is done with the following algorithm: - Start with W = 1 - On every ACK, do W = W + 1 This algorithm is, somewhat ironically, known as "slow start". Don't let that fool you. When TCP was first developed, connections started by sending as much data as they could, and then backed off. Compared to that, an exponential window increase is indeed slow. ====================================================================== Additional Details ====================================================================== There are a few additional details of TCP that are worth mentioning: 1. It uses cumulative ACKs. Some of you may have seen a problem on cumulative ACKs in recitation. Cumulative ACKs mean that when a receiver sends an ACK with sequence number k, it means "I have received all packets up to and including k", rather than just "I have received packet k". Aside: TCP's ACKs actually reflect the next packet it *expects* to receive. This is not important for us. What you should have seen from that problem -- and what you should be able to figure out yourself -- is that cumulative ACKs have less overhead when there is packet loss. 2. It has a heuristic for distinguishing between lost packets and re-ordered packets known as "fast retransmit". In most cases, TCP is able to quickly retransmit a lost packet instead of waiting for a full timeout. 3. Despite fast retransmit, in the case of severe congestion, TCP's window will drop back down to 1. This is why it is so important to avoid congestion in a network; your TCP connections become "throttled" (they get very low throughput) in the face of congestion. 4. TCP also delays ACKs, and sends a cumulative ACK for (roughly) every other packet, not every packet. So, for instance, in the case of no loss, we will see a sequence of ACKs with numbers "2, 4, 6, 8, .." rather than "1, 2, 3, 4, 5, 6, 7, 8". This makes the statistic I showed last week for Netflix's ACK traffic that much more astounding. ====================================================================== An Alternate Approach ====================================================================== There is an alternate approach to congestion control other than end-to-end congestion control. End-to-end congestion control is (relatively) simple and requires no change to the infrastructure. However, it's not as efficient nor as fair as it could be (in part because endpoints are constantly readjusting their window). The alternative is to involve the switches in the middle of the network, and have them "help" endpoints compute the optimal rate. These approaches are more powerful, more efficient, and more fair, but they are complex and require changes to the infrastructure. For instance, here is one idea: TCP reacts to congestion only when there are packet drops, which means it reacts only when the queue is already full. That *also* means that TCP doesn't react until there are already large delays in the network. Imagine that the router started dropping an occasional packet when the queue was, say, 50% full. Some TCP senders would back off, and perhaps the queue could be prevented from ever getting completely full. This would keep delay low and utilization high (remember: we can utilize a network fully without the queues being full; that's what we get from the W vs. throughput graph). Protocols such as RED, DCTCP, and ECN work this way. ====================================================================== Why This is All Cool ====================================================================== Like I said at the beginning, after this lecture, you aren't supposed to be a congestion control expert. I didn't even give you all of the details of TCP's congestion control algorithm (I did give you the most important ones, though). So what was the point? The point was to get you to think about all of the problems we have to deal with when multiple connections are sharing a network (e.g., the Internet). Assumptions that we often make in class -- a single sender and receiver -- don't hold. The approaches we developed -- set W to be fixed as the bandwidth-delay product -- are still relevant, but aren't enough. Moreover, whenever we design protocols for the Internet, we can't rely on a centralized authority. Congestion control would be simple if there was an oracle that could tell every sender exactly at what rate to send, but we don't have that. The senders need to figure out the missing information all by themselves. Basically: the Internet is an extraordinarily complex environment to design for, and you all should be amazed that it works at all.