6.02 - MAC Protocols
Lecture #18
Katrina LaCurts, lacurts@mit.edu

======================================================================
6.02 Part 3 - Networks
======================================================================

So far, 6.02 has given you a fairly complete picture of how to get
data from one machine to another one that is directly connected:
- We start by encoding the data in a particular way (Huffman, LZW),
  typically with the goal of being efficient in terms of the number of
  bits we use.
- Then, we might add error correction into that data stream (Hamming
  codes, convolutional codes), to make the transmission more robust to
  noise.
- That data is modulated and sent across the channel, where it is
  affected by noise.
- On the receiving end, data is demodulated and filtered, and an
  approximation of the original *signal* is recovered.
- Then, we use threshold detection to determine what constitutes a 0
  and what constitutes a 1, and decode the signal back into a
  bitstream.
- Finally, those bits are decoded with whatever method is appropriate
  (syndrome decoding, Viterbi).

The third part of 6.02 will deal with how we handle communications
between multiple machines, i.e., networking.  The next lecture will
give a more complete overview of networking and the questions therein.
Today's lecture is more of a bridge between Parts 2 and 3.  Along the
way I will introduce some networking terms, which you'll see again in
the next lecture.  I've also included a dictionary along with these
notes.

======================================================================
Introduction to Media Access Control (MAC)
======================================================================

----------------------------------------------------------------------
Shared Media
----------------------------------------------------------------------

"Shared media" refers to communication channels where multiple "nodes"
can all be connected and hear each other's transmissions.

This should be remarkably intuitive: we're all sitting in lecture
sharing this (802.11) wireless network.  There are lots of other
examples:

- Satellite networks: many nodes on land can send their data to a
  satellite, which will then send the data back down to the
  destination.

- Cellular networks: your phones connect to a cell tower in your
  vicinity, and the tower is then connected to the rest of the phone
  network (we don't care about the details of cell networks in 6.02);
  multiple phones can connect to the same tower.

- PS6: PS6 presented a shared medium.  You sent two messages on two
  different frequencies across the same channel.  It's a little
  different because the two messages came from the same node, but you
  could imagine extending PS6 to work across machines, and allowing
  multiple machines to transmit at once, on different frequencies (in
  fact we'll come back to this idea).

We can even talk about shared media with things other than
communication networks.  For instance, roads form a network, and those
roads act as a shared medium (cars are sharing the channel).

----------------------------------------------------------------------
Necessary Abstractions
----------------------------------------------------------------------

We're going to talk about shared media in general, not for any one
particular abstraction.  The ideas we'll talk about today are
applicable to any of the above examples (except, perhaps, the road
network).

Here is our abstraction:

1. Nodes transmit units of data called packets.  A lot of times, this
   is about 1.5KB of data (that number is not important in 6.02 and
   can change; I'm giving it to you strictly to make the notion of a
   packet more concrete).  If we wanted to transmit more data than
   that, we'd just break it into multiple packets.

2. Nodes interface with the channel in some way.  This is what
   connects the queue of packets to the shared channel.

3. The channel is shared.  When two or more nodes are transmitting at
   the same time, the packets will collide and be lost.  This is a
   collision.  Even if only part of a packet is involved in a
   collision, the entire packet will be lost (you will see later in
   these notes how it is possible for "part" of a packet to collide).

4. Nodes can tell (can "detect") when a collision has occurred, and
   re-transmit the packets later.

5. On the shared channel, nodes may be able to hear each other
   perfectly, partially, or not at all.  If node A hears node B
   sending a packet, then A will wait to send its packet (and thus
   avoid a collision with B).  But this won't be possible if A can't
   hear B, or if A and B start sending at the same time.

----------------------------------------------------------------------
Goals
----------------------------------------------------------------------

To deal with this shared medium, we're going to design a
communications protocol.  The goals of the protocol are two-fold:
1. To avoid collisions
2. To ensure good performance

(Without the second goal, it's very easy to satisfy the first: don't
allow any communication whatsoever).

What do we mean by "good performance"?  There are many facets of
performance.  Here is the first:

---------------------------- Utilization ----------------------------

1. Utilization: the channel is a shared resource, so we should use it
   efficiently.  Ideally, we'd use 100% of its capacity (if we don't,
   we could be sending more data, and aren't).

We calculate utilization the following way:

U_channel =   total throughput over all nodes
            -----------------------------------
              maximum data rate of the channel

Obviously, at this point I need to tell you what throughput is.
Throughput is the amount of data a node transfers over a certain
period of time (PRO TIP: throughput is a *unit of data* over a *unit
of time*, e.g., megabits-per-second, bytes-per-second, etc.  If you
are calculating throughput, please check your work; if you end up with
something like seconds-per-byte, byte-per-megabit, seconds-per-minute,
etc., something has gone horribly wrong).

We will do a lot more with throughput in a few lectures.

Example: Suppose we have a channel capable of sending 10
megabits-per-second (Mbps).  Four nodes are connected to it, and they
are sending with throughputs 1Mpbs, 2Mpbs, 2Mpbs, and 3Mpbs.  What is
the utilization?  (1 + 2 + 2 + 3) / 10 = 0.8

Note: 0 <= U <= 1:
- U cannot be greater than 1; the channel cannot send more data than
  it is capable of sending.
- U cannot be less than 0: that doesn't even make sense.
- U = 1 is perfect utilization (the channel is 100% utilized)
- U can be less than 1 for a variety of reasons:
  - The nodes have packets to transmit, but the protocol is
    inefficient
  - Nodes don't have enough packets to transmit to use the full
    capacity of the channel

(Some more terms: nodes that have packets in their queue are
backlogged.  The packets available for transmission constitute the
offered load; when there is insufficient offered load, U is less than
1)

----------------------------- Fairness ------------------------------

We could achieve 100% utilization by always letting node A use the channel
(assuming it has infinite data to send; if not, maybe we let node B
jump in until node A has data).  But that certainly seems like an
undesirable outcome.  So we'll add a second goal:

2. Fairness: ideally, we'd divide the capacity equally among the nodes
   requesting to use it.  But there are all sorts of problems here:
   not all nodes need to send data all of the time, we need a way to
   determine *who* has data to send, some data is more important than
   other data (e.g., maybe calls to 911 should get priority on the
   cell network), ... (really I could go on for days).  In 6.02 we're
   going to concern ourselves mostly with the first one, but just know
   that fairness is an Issue in networking.

How do we calculate fairness?  That's a complicated question; there
are actually multiple definitions of fairness.  But we have a
particular one we'll use in 6.02.

Suppose we measure the throughput x_i that node i achieves under our
protocol, and assume there are N nodes.  If things are perfectly fair,
each x_i will be equal.  The less fair the protocol, the more spread
out this distribution is.

We capture that notion with the following fairness index:

F = ( sum_i=1^N x_i )^2   / ( N * sum (x_i^2) )

(The N in the denominator lets us normalize as the number of nodes
varies)

From this, you can see that 1/N <= F <= 1.  If F = 1/N, then a single
node is getting all of the throughput.  If F = 1, the protocol is
perfectly fair (Work these out yourself!  You should understand how I
came to that conclusion).

One tricky aspect of this is: over what period should we calculate the
throughput used to make these calculations?  A few seconds?  Hours?
We'll consider both.

There are other goals that we care about, and I'm going to talk about
those at the end of class today, but for now, utilization and fairness
are the two big ones.  They are often at odds with each other.

======================================================================
Time Division
======================================================================

It's time to analyze some specific Media Access Control (MAC)
protocols.  The utilization/fairness calculations will come in during
this analysis; it's how we'll determine whether a protocol enforces
"good" performance.

(Note: If you have experience in cryptography, you may know MAC as an
acronym for Message Authentication Code.  Media Access Control has
nothing to do with that; it's an unfortunate collision of acronyms.
If you have experience in computer systems, you may have heard of a
MAC Address.  Media Access Control *does* have something to do with
that -- a MAC Address is a Media Access Control Address -- but we are
not going to talk about MAC Addresses in 6.02.)

----------------------------------------------------------------------
Frequency Division vs. Time Division
----------------------------------------------------------------------

You've already shared a channel in PS6.  There, multiple senders used
*frequency division* to share the channel.  Each picked a different
frequency on which to send, and the receiver was able to recover both
transmissions.  You saw that this required some finesse: the channels
needed a gap between them, smart filtering had to be used, etc.  And
it only got us so far; we couldn't handle much more than six messages
transmitting at once.

Today, we're going to share the channel in a different way: instead of
dividing up frequency, we're going to divide up time.

----------------------------------------------------------------------
Necessary Abstractions for Time Division
----------------------------------------------------------------------

1. Time is divided into slots of equal length.

2. Each node can start a transmission of a packet only at the
   beginning of a time slot.

3. All packets are of the same size and hence take the same amount of
   time to transmit, equal to some integer multiple of time slots
   (e.g., 1 time slot, 2 time slots, ...)

Now recall our channel abstractions.
- Collisions happen when two senders send at the same time
- The channel is idle when no one sends
- A transmission is successful when exactly one sender sends in a
  time slot

Remember that even if a collision involves only part of a packet, the
entire packet is assumed to be lost.  Also remember that nodes can
detect when a collision happened (and then will retransmit the
packet).

To calculate the utilization of these protocols, we need to be precise
about how we calculate throughput.  It is the number of *uncollided*
packets per time interval.

======================================================================
Protocol 1: Time Division Multiple Access (TDMA)
======================================================================

----------------------------------------------------------------------
Protocol
----------------------------------------------------------------------

1. For N nodes, give each node a unique index in the range [0, N-1].
   Assume each slot is numbered starting at 0.

2. Node i gets to transmit in time slot t iff t mod N = i.  So a
   particular node transmits once every N time slots.

The way that we would make this work technically would be to have a
centralized "resource allocator" and a way to ensure time
synchronization between nodes.  We'll just assume this exists for now,
and not worry about the details in 6.02.

----------------------------------------------------------------------
Analysis
----------------------------------------------------------------------

Let's see how well this protocols meet our goals: eliminating
collisions, providing high utilization and fairness.

Are there collisions?  No!  Assuming everything works correctly, there
are no collisions.

Is it fair?  Seems pretty fair; every node gets an equal share of the
channel.

In fact, if we assume all nodes always have data to send, and the
throughput of the channel is X_c, then each individual node has
throughput x_i = X_c / N.  And then:

    F = (sum_i=0^{N-1} x_i)^2 / (N * sum_i=0^{N-1} (x_i^2) )
      = (N * X_c/N)^2 / (N * N * X_c^2 / N^2)
      = 1

So it's perfectly fair by our definition, and under this assumption
about nodes having infinite offered load.

But what about utilization?  Well, if nodes have infinite offered
load, then utilization is 100%.  But if they don't, what happens?

There are wasted time slots!  If node 3 doesn't have very much data to
send, its slots will usually be wasted.  It seems silly not to allow
another node to send in those wasted slots.

So TDMA is particularly poor when nodes have different offered loads,
or when data is sent in bursts (i.e., nodes have a lot of data to send
at once, and then a period of no data to send, rather than having a
constant stream of data to send).

======================================================================
ALOHA
======================================================================

We are going to improve on TDMA with a protocol where allocation is
*not* pre-determined (this type of protocol is known as a contention
protocol).  That will allow us to improve performance when the offered
load of the nodes differs.

----------------------------------------------------------------------
ALOHA History
----------------------------------------------------------------------

This protocol -- ALOHA -- comes from ALOHAnet, designed by Norm
Abramson.  ALOHAnet was a satellite network that connected computers
on the Hawaiian islands.  One frequency was used to send data from the
island stations to the satellite, another frequency was used to send
data from the satellite to the island stations.

The stations could only hear the satellite, and so had to
independently decide when it was their turn to transmit.

(Sidenote: ALOHA should be in all-caps, but it is not an acronym.  We
will not take points off from any assignments if you typeset it
incorrectly :) )

----------------------------------------------------------------------
Protocol 2: Slotted ALOHA
----------------------------------------------------------------------

We will first study "Slotted ALOHA", which assumes that each packet
takes exactly one time slot to transmit.

The protocol is dead simple: If a node is backlogged (i.e., has
packets to send) it sends a packet in the next time slot with
probability p.  Since slots aren't pre-allocated, we don't have the
problem where a node is meant to send but has no data.

But this leads to an obvious question: what should p be?  If p is very
large, we will see lots of collisions.  If p is very small, the
network will be under-utilized.

So let's figure out the utilization given a particular probability p.
Assume that N nodes are backlogged.

In this case, the utilization when N nodes are backlogged is
equivalent to the probability that exactly one node sends a packet.

   P(send a packet) = p
   P(don't send a packet) = 1-p
   P(successful transmission) = P(there is only one sender)
                              = P(one node sends and all others
                                  *don't* send)
                              = p * (1-p)^(N-1)

   How many ways can we have exactly one sender? (N choose 1) = N.

So U_{slotted ALOHA) = N * p * (1-p)^(N-1)

   Aside: What we've calculated here is, in some sense, the
   theoretical utilization of ALOHA.  Any particular run of the ALOHA
   protocol could have a different utilization, because the run may
   not reflect our assumptions (e.g., that every node transmits with
   exactly the value p).  We'll come back to this.

Now, what should p be?  We would like to maximize our utilization, so
by taking the derivative and setting it equal to zero, we'll find a
maximum at p=1/N.

When p=1/N, utilization is:

    N * (1/N) * (1 - (1/N))^(N-1) = (1 - 1/N)^(N-1)

What happens as N grows large?  As N -> infinity, (1 - 1/N)^(N-1) ->
1/e \approx 37%.

37% is not that great!  But note that Slotted ALOHA is extremely
simple.  A lot of times we're willing to trade simplicity for
performance.

Regardless, we have another problem: How do the nodes know the value
of 1/N?  Even if the satellite communicates the value of N (and thus
1/N) to them, what if N changes?  Remember that N is the number of
backlogged nodes; that value will almost certainly change over time
(even if N were the total number of nodes, we would like a protocol
that allowed for nodes to join and leave the network).

The fact that the nodes can't independently set p to the optimal value
is the real problem with Slotted ALOHA, not its low utilization.

----------------------------------------------------------------------
Protocol 3: Stabilized ALOHA
----------------------------------------------------------------------

To deal with this, we will allow nodes to dynamically adjust p.  Let's
say that a node sends a packet, and it collides with another node's
transmission.  What should happen to p?  It should probably decrease;
collisions are indicative of there being too much traffic in the
network.

What about if the node's transmission is successful?  We could try
keeping p the same; after all, it seems to be working.  But now
imagine that in the next slot we experience a collision; p will
decrease.  There might be a few successful transmissions (and a few
idle slots), and then another collision; p will decrease again.  p is
going to get smaller and smaller, and never increase.

So when a node's transmission is successful, we will *increase* p.

What we're doing here is trying to stabilize the system; we're looking
to make sure it's operating at or near a desired operating point (in
this case, at or near p = 1/N).

We will start out with the following decrease/increase rules:

1. When a transmission collides, set p = p/2.  This is known as
   multiplicative decrease.

   If there are k collisions in a row, p will be decreased by a factor
   of 2^{-k}; this is known as binary (2) exponential (k) back-off
   (smaller p).

2. When a transmission is successful, set p = min(2*p, 1)

   An alternate idea is to just set p=1.  This might seem dramatic,
   but it will ensure that no slots go idle.  We're not going to
   analyze this case, though.

Trying these rules, we will find that for some nodes, a series of
successive failures will result in a very small p, which results in
very few transmission attempts.  Effectively, nodes that experience
this are starved out of the system.

We'd like to prevent p from getting too small.  This is easy to
enforce; we'll set some limit p_min, and then our rules become:

1. On collision, p = max(p_min, p/2)
2. On success, p = min(2p, 1)

With these rules, we get high fairness and reasonable utilization (the
smaller p_min, the higher utilization you'll get; in fact, you can get
much higher than our "theoretical maximum".  As I mentioned before,
the utilization that we calculated was based on all nodes sending at
exactly the same p.  This is not the case here!).

But we see something concerning: some nodes "capture" the channel for
a period, starving out the others.  This is known as the capture
effect.  It leads to significant short-term unfairness, which is
undesirable for a few reasons:

- At a high-level, it makes it more difficult for a node to have
  constant communication; they're communicating in bursts.  Think
  about how that might affect interactive communications such as phone
  calls.

- This type of short-term unfairness is also bad for particular
  Internet protocols that we won't discuss in 6.02 (basically: if
  there is too much time between successive transmissions, these
  protocols will "time out").  It also causes problems with queues,
  which we *will* talk about in 6.02, but not for a few lectures.

The capture effect occurs when one node maintains a particularly high
p (near 1).  We'll deal with this the same way we dealt with the
starvation issue:

1. On collision, p = max(p_min, p/2)
2. On success, p = min(2p, p_max)

The analysis we've done for this protocol has been done via
simulation.  You'll see more of this in PS7.  To do a more formal
analysis, take 6.041 or 6.008 and learn about Markov chains.

======================================================================
Protocol 4: CSMA
======================================================================

Earlier we mentioned that nodes using a shared channel could hear each
other completely, partially, or not at all.  Our discussion of ALOHA
was built upon the premise that each node could hear the satellite,
and the satellite could hear the nodes, but in networks in general,
it's not true that all nodes can hear each other all of the time.

However, if nodes *can* hear each other, we can utilize this property:
if the a node can first listen on the channel before attempting
transmission, it can avoid the transmission if it hears another node
(and thus avoid a collision).  This is known as carrier sense.

We can modify Stabilized ALOHA to use what is called CSMA: Carrier
Sense Multiple Access.  You will investigate this process in PS7.
Notice that just because we use CSMA doesn't mean that we will
experience 100% utilization: there could still be idle slots, or two
nodes could start transmitting at the same time.

======================================================================
The rest of networking
======================================================================

----------------------------------------------------------------------
Goals
----------------------------------------------------------------------

In today's lecture, we dealt a lot with fairness and utilization.
Those aren't the only relevant goals for good performance in networks.

Some applications, like phone calls, are particularly sensitive to
delay in the network.  E.g., you don't want a 3-second delay on a
phone call -- it will become unusable -- or most video -- the video
will be jittery, and if it's live TV, you'll be three seconds behind
the rest of the world!  So, a third goal:

3. Bounded wait: We'd like an upper bound on the wait before a
   successful transmission.

There are also some high-level systems goals we care about:

4. Dynamism: Our protocol should be able to handle variability in user
   behavior.  For instance, it should not rest on an assumption like:
   "All nodes should have data to send at all times"; that's not a
   reasonable assumption.

5. Scalability: Our protocol should be suitable for a large number of
   users; it should "scale well".  (in the advent of Big Data, talking
   about scale has become somewhat buzzword-y, but it's actually
   incredibly important; see 6.033 for more details).

----------------------------------------------------------------------
Things we don't care about today
----------------------------------------------------------------------

There were a few things we did not care about today:
- How nodes in a network know where to send (e.g., if I want to send a
  message to George, how does my computer know where George's computer
  is?)
- How data moves within a network.  In ALOHA, it's up to the
  satellite, and then back down, but what about more complicated
  topologies?
- How can we guarantee that data makes it from the transmitter to the
  receiver?

These are all things that matter in a network, and we'll deal with
some of them in future lectures.