6.02 - Improving the Throughput of Reliable Transport
Lecture #23
Katrina LaCurts, lacurts@mit.edu

======================================================================
Recap: The problem with stop-and-wait
======================================================================

Recall the stop-and-wait protocol from the previous lecture:

  At the sender:
    - Send one packet, and keep track of its sequence number and
      transmission time
    - Once an ACK is received for that packet, delete the stored
      sequence number, and send a new packet (using the same strategy
      of saving its sequence number and waiting for an ACK)
    - If an ACK hasn't been received after timeout seconds since the
      packet's transmission time, retransmit it to the receiver.

  At the receiver:
    - Upon receipt of packet k, send an ACK for packet k
    - If k is greater than the last sequence number we received (or if
      we haven't received any packets yet), then deliver the packet to
      the application and keep track of k

This protocol uses sequence numbers and ACKs to provide basic
reliability and re-ordering capabilities.  We added the timeout at the
sender to avoid deadlock, and the receiver keeps track of packets that
it previously delivered to the application in order to avoid
delivering duplicates.

Although this protocol is a correct reliable transport protocol, we
found a problem: its utilization can be quite low.  We saw that in the
best case, the throughput was equal to (1-L)/RTT, where L is the
probability that a packet or its ACK is loss.  Even with zero loss,
this can lead to incredibly low utilization (e.g., using stop-and-wait
on a link with an RTT of 100ms that is capable of 100 packets per
second will result in 10% utilization).

======================================================================
The solution: Sliding-window
======================================================================

The problem in the previous example is that we had a link that could
handle 10 packets every RTT (100 packets per second = 10 packets per
100ms, on average) and yet stop-and-wait only allows one packet to be
on a link at a time.  We frequently say that stop-and-wait allows only
one "outstanding", or un-ACKed, packet at a time.

Our solution -- the sliding-window protocol -- will be to allow more
than one outstanding packet at a time.

----------------------------------------------------------------------
The basic idea
----------------------------------------------------------------------

The basic ideas are these:

- An end point will allow W outstanding packets (i.e., W un-ACKed
  packets) at once.  W is known as the window size.  In 6.02, for
  today, W is fixed and known by the sender (we will discuss later how
  to set it).

- The collection of packets that are outstanding are known as the
  window.  Every time the sender receives an ACK, the sender "slides"
  its window by 1.

(Students: You should check the slides for diagrams of this process.
It was too hard to draw in ASCII.)

This idea of sending a new packet every time we receive an ACK is
actually really clever.  When the sender receives an ACK for packet k,
it doesn't just mean that packet k has been received; it *also* means
that a packet (k) has left the network.  This means that there is
"room" for a new packet in the network.

   Aside: This same technique is used in more complicated reliable
   transport protocols (e.g., TCP); it's known as "ACK-clocking".
   ACK-clocking lets end points estimate how much data they can put
   into the network on their own, without coordinating with any other
   nodes or a centralized server.  It's really quite cool from a
   distributed-systems perspective.

We can now put stop-and-wait in context with sliding-window:
stop-and-wait *is* a sliding-window protocol, just with a window size
of W = 1.  Setting the correct window size -- the one that maximizes
network utilization -- is a challenge we will overcome later in this
lecture.

Now you have the basic idea of sliding-window, so I'll describe a few
details to you:
- What happens when a packet is lost
- What happens when packets are received out of order.

----------------------------------------------------------------------
At the sender: Handling retransmissions
----------------------------------------------------------------------

Sliding-window deals with lost packets just like stop-and-wait does:
it waits timeout seconds and then retransmits the packet.

SLIDE: Show this

Consider a window size of 5.  The sender firsts sends packets 1
through 5, and then starts to receive ACKs.

- Start: Window = [1, 2, 3, 4, 5]
- Receive ACK for 1 => send a new packet
  Window = [2, 3, 4, 5, 6]
- Receive ACK for 2 => send a new packet
  Window = [3, 4, 5, 6, 7]
- Receive ACK for 3 => send a new packet
  Window = [4, 5, 6, 7, 8]
- Receive ACK for 4 => send a new packet
  Window = [5, 6, 7, 8, 9]
- Receive ACK for 5 => send a new packet
  Window = [6, 7, 8, 9, 10]
- Receive ACK for 6 => send a new packet
  Window = [7, 8, 9, 10, 11]
- Receive ACK for *8* => send a new packet (assume 7 was lost)
  Window = [7, 9, 10, 11, 12]
- Receive ACK for 9 => send a new packet
  Window = [7, 10, 11, 12, 13]

Eventually, a timeout will fire for packet 7, and the sender will
retransmit it:

- Timeout fires for packet 7 => retransmit
  Window = [7, 10, 11, 12, 13]

This example illuminates a few important issues:

1. The window of outstanding packets does not necessarily contain
   successive sequence numbers.

2. If the sender retransmits a packet, and its window is already full
   (i.e., it contains W outstanding packets), it does *not* also send
   a new packet.

   For instance, in the last step in my example, we did not do this:
   - Timeout fires for packet 7 => retransmit and send a new packet
     Window = [7, 10, 11, 12, 13, 14]

   Why?  Simple; the sender is only allows to have W (=5 in this
   example) outstanding packets at a time.

3. To handle retransmissions, the sender needs to keep track of the
   un-ACKed packets and the times they were sent.

At the sender, a sliding-window protocol looks something like this:

  - Transmit a packet if len(un-ACKed list) < W
  - Upon transmission of packet k, keep track of k in the un-ACKed
    list, and the time that k was sent
  - When an ACK for packet k is received, remove k from the un-ACKed
    list.
  - Periodically check the un-ACKed packets list to see if any
    packets were sent more than timeout seconds ago.  If so,
    re-transmit.

In practice, the sender's "send()" function is called once every
timeslot, with a very fine-grained timer.  So all of these checks
(whether we should send a new packet, whether we should re-transmit a
packet) are happening frequently.

----------------------------------------------------------------------
At the receiver: Handling out-of-order packets
----------------------------------------------------------------------

Now, what's going on at the receiver?  As before, the receiver ACKs
every packet it receives, without exception.  The challenging part is
deciding what packets to deliver to the application.

The receiver will keep each packet it receives in a buffer, ignoring
duplicates.  It will also keep track of the next packet the
application expects.  When it delivers a packet to the application, it
removes it from this buffer, but the receiver *cannot* deliver a
packet if the application isn't expecting it.

Let's see how this plays out in our example:

- Start
  Sender window = [1, 2, 3, 4, 5]
  Application expects: 1
  Received buffer = []

- Receive ACK for 1 => send a new packet
  Sender window = [2, 3, 4, 5, 6]
  Application expects: 1
  Received buffer = [1] => okay to deliver 1

- Receive ACK for 2 => send a new packet
  Sender window = [3, 4, 5, 6, 7]
  Application expects: 2
  Received buffer = [2] => okay to deliver 2

- Receive ACK for 3 => send a new packet
  Sender window = [4, 5, 6, 7, 8]
  Application expects: 3
  Received buffer [3] => okay to deliver 3

- Receive ACK for 4 => send a new packet
  Sender window = [5, 6, 7, 8, 9]
  Application expects: 4
  Received buffer = [4] => okay to deliver 4

- Receive ACK for 5 => send a new packet
  Sender window = [6, 7, 8, 9, 10]
  Application expects 5:
  Received buffer = [5] => okay to deliver 5

- Receive ACK for 6 => send a new packet
  Sender window = [7, 8, 9, 10, 11]
  Application expects 6
  Received buffer = [6] => okay to deliver 6

- Receive ACK for *8* => send a new packet (assume 7 was lost)
  Sender window = [7, 9, 10, 11, 12]
  Application expects 7
  Received buffer = [8] => NOT okay to deliver 8

- Receive ACK for 9 => send a new packet
  Sender window = [7, 10, 11, 12, 13]
  Application expects 7
  Received buffer = [8, 9] => NOT okay to deliver 8 or 9

The receiver's buffer will keep growing until it receives the
re-transmission of packet 7, after which it will be okay to deliver
7.  In fact, the receiver will be able to deliver 7, 8, and 9 all at
once.

From this example, you can see that it is *not* okay for the receiver
to simply discard packets that arrive out of order.  It needs to
buffer them and deliver them in order.

So at the receiver, our protocol looks like this:

  - Send an ACK for every received packet
  - Save delivered packets -- ignoring duplicates -- in a local buffer
  - Keep track of the next packet the application expects.  After each
    reception, deliver as many in-order packets as possible.

SLIDE: Graph of packet re-transmissions and ACKs

======================================================================
Analysis of Sliding-window
======================================================================

The throughput of sliding-window depends, not surprisingly, on the
size of the window.  Our goal is to set a window size that maximizes
throughput.

----------------------------------------------------------------------
Bandwidth-delay product
----------------------------------------------------------------------

Let's start off with W = 1, and we'll assume for now that there is no
packet loss.  In the last lecture, we saw that the throughput will be
1/RTT.

SLIDE: Start graphing this

If we increase W to 2, the protocol will deliver two packets every RTT
seconds.  So a throughput of 2/RTT.  If W is 3, then 3/RTT, etc.  It
seems like no matter what W is, the throughput will be W/RTT.

So we can just keep increasing W and increase our throughput?  No, of
course not.  Along the path from a sender S to a receiver R, there is
some bottleneck link: the slowest link on the path.  Let's call its
rate B (in packets/second).  We will not be able to exceed a
throughput of B, no matter what value we use for W.

Let's first figure out what value we need for W to achieve a
throughput of B packets per second.  If the throughput is W/RTT, and
we want that equal to B, then we should use W = B * RTT.

   This result can also be seen with an application of Little's Law.
   Consider the path between S and R, and imagine that the path is
   just one big queue.  If W is the window size, and RTT is the mean
   delay between the transmission of a data packet and the reception
   of its ACK, packets are being drained from the queue at rate B (the
   bottleneck link rate), so the mean number of packets in the queue
   is B * RTT.  Since the queue here is our path, that means that the
   mean number of packets in the path is B * RTT.  W is equivalent to
   the mean number of packets in the path, hence W = B * RTT.

The formula W = B * RTT has the same problem we discussed above: B =
W/RTT, which implies that as we increase W, B will also increase,
which we know is not true.  So what's wrong?

Let's break down the RTT.  As we know, the delay experienced by a
single packet comes from multiple things:
- Transmission, propagation, and processing delays.  These are all
  (roughly) constant, per-packet, and they are never zero.
- Queueing delay, which is variable, and could in fact be zero (if
  there are no packets in any of the queues)

Let's call RTT_min the delay in the absence of queueing.  It's the
absolute fastest a packet could travel through our network.  If the
RTT is equal to RTT_min, then our equation is just W = B *
RTT_min.

Well, how does that help us?  We can still write B = W/RTT_min, which
still gives the impression that as we increase W, B increases.

But notice!  If W is larger than B * RTT_min, then queues in the
network are going to start to build.  The RTT won't be a quantity
independent of W; in fact, it will be very much dependent on W, and
will no longer be equal to RTT_min.  So increasing W will actually
increase the RTT; B will not increase.

Intuitively, if W > B * RTT_min, then the throughput remains constant
at B.  Mathematically:

  Let Q be the average number of packets in the queue of the
  bottleneck link.  By Little's Law, Q = B * tau, where tau is the
  average delay in the queue.

  We also know: RTT = RTT_min + tau = RTT_min + Q/B.

  Since W is the number of un-ACKed packets, these packets are either
  in the queue or "on the wire" (i.e., on the path between the sender
  and receiver but not in the queue).  So: W = Q + B*RTT_min.

  Now let's use Little's Law again.  Throughput = W/RTT = (Q +
  B*RTT_min) / (RTT_min + Q/B) = B.

SLIDE: Graph of throughput vs. window size

The quantity B * RTT_min is known as the bandwidth-delay product.

----------------------------------------------------------------------
Calculating throughput
----------------------------------------------------------------------

Let's actually calculate throughput.  Remember that the protocol
delivers W packets every RTT seconds, so the throughput is W / RTT
(the issue is that, with larger W, RTT will also grow).

Without packet losses:
- If W <= B*RTT_min, then Throughput = W / RTT_min
- If W > B*RTT_min, then Throughput = B
  - Moreover, when W > B*RTT_min, then W = B * RTT_min + Q, where Q is
    the queue occupancy.

And what about packet losses?  Let L be the probability that a packet
or its ACK is lost.  We know that our utilization is at most 1 - L.

  The expected number of transmissions, T, for successful delivery of
  a packet and its ACK is T = (1-L)*1 + L*(1 + T), so T = 1/(1-L), and
  utilization = 1 / (1/(1-L)) = 1-L.

In this case, throughput = B * (1-L).  To achieve this, we pick W >
B*RTT_min to ensure that the bottleneck link is busy even when there
are packet losses.

If W >> B * RTT_min, then the delays will be too large, the timeout
will be too big, and other connections may suffer.

======================================================================
A Summary, and a Lingering Question
======================================================================

This is a summary of the reliable transport protocol we've ended up
with:

- Use sequence numbers and ACKs to give basic packet-ordering
  capabilities and reliability.
- The sender keeps W outstanding packets at a time, and re-transmits
  packets if it has not received an ACK after timeout seconds.
- The receiver keeps a buffer of received packets, ignores duplicates,
  and delivers them in-order to the receiving application.
- Setting W = B * RTT_min (or slightly larger in the case of packet
  loss) allows us to achieve the maximum throughput of B (or (1-L)B in
  the case of packet loss).

Before I continue, remember weeks ago when we were comparing
circuit-switching and packet-switching?  Circuit-switching seemed like
a bad idea: it only worked for a particular workload.  And indeed,
packet-switching is much more flexible.  But you can see now how
complicated things got!  We have also added a lot of overhead to the network.

Now, here's a question for you: In all of our models today, we dealt
with an abstraction where there was just a single connection.  What
happens when there are multiple connections using the network?
Clearly they can't all achieve a throughput of B.

Let's imagine a network with two connections.  If we want them to
share the network equally, they should each get a throughput of B/2.
This means their window sizes should be B/2 * RTT_min.  With three
connections, B/3 * RTT_min, etc.

How can we get this to happen?  The senders don't know how many other
connections exist in the network, and moreover, it's constantly
changing.  Second, sometimes a sender might not have a lot of data to
send.  For instance, in my case above, imagine that one connection
only has enough data to send B/4 packets per second.  Then the other
connection should certainly be allowed to send 3B/4 packets per
second, if it can.  How will the end points figure this out?

(We'll answer this on Wednesday)