6.02 - Linear Block Codes
Lecture 4
Katrina LaCurts, lacurts@mit.edu

======================================================================
Channel Coding
======================================================================

Recap from last week: George mentioned channel codes, which take k
message bits and convert them into an n-bit codeword, n > k.  We will
refer to this as a "block code".

     Example: 0 -> 000
              1 -> 111

[ k-bit message block ] -> channel coder -> [ n-bit codeword ]

There are 2^n possible codewords, but only 2^k possible messages.
Hence we will not be using all possible codewords.  The rate of this
code is k/n, and we often refer to them as (n, k) codes.  

     Example: 0 -> 000  this is a (3, 1) code
              1 -> 111

These codewords then go through a channel:

… [ n-bit codeword ] -> channel ->

In our model, the channel is a memoryless binary symmetric channel,
where each bit gets flipped with probability p, p < .5.

   Aside: if we had a channel with p = .5, we wouldn't be able to get
   information through.  If we had a channel with p > .5, we'd just
   flip all the bits again before decoding them, and then effectively
   we'd have p < .5.  So it's safe to assume in this lecture that p <
   .5 always.

… channel -> [n-bit codeword] -> channel decoder -> [k-bit message
block]

Suppose a codeword c is received.  How do we decode it?  One of
two things must've happened.

1. c_recv is a valid codeword; we do nothing clever.

2. c_recv is *not* a valid codeword.  In that case, we will decode it
   as whatever the nearest valid neighbor is, where "nearness" is
   based on Hamming distance.

     Example: Receive 010.  Decode as if it were 000.

Will this always be correct?  No.  Imagine that that received codeword
above actually resulted from sending 111 and having two bits flipped.

----------------------------------------------------------------------
Choosing our codewords
----------------------------------------------------------------------

We have 2^n codewords to choose from, and only need to select 2^k.
How should we do this?  For example, in my k=1 n=3 example, I could've
just as easily have done this:

0 -> 000
1 -> 100

Instead of:

0 -> 000
1 -> 111

What's bad about the first?  Well, if I send "000", and there is just
one bit flip, I'll potentially have another valid codeword ("100").
I'd like to be able to make a statement such as "as long as there are
no more than x bit flips, this code will be decoded correctly".  I
can't make that statement for the first set of codewords (well,
technically I can: "as long as there are no more than 0 bit flips,
this code will be decoded correctly").  I can for the second
(intuitively: I can correct errors from one bit flip) and we'll see why
shortly.

To choose codewords, we embed for structural separation.  Encode the
codewords so that they're far from each other in space; likely error
patterns shouldn't transform one codeword into another.  How much
"space" (or volume) we put around each codeword depends on the likely
level of noise on the channel.

George gave this spatial representation last week:

    101-----111
   / |     / |
100-----110  |
 |   |   |   |
 |  001--|--011
 | /     | /
000-----010

"000" and "111" is the best choice for our code because they are the
farthest apart in space, where distance is measured by the hamming
distance (I will justify this statement shortly, though you should see
intuitively why this is the best).  We will be particularly interested
in the minimum hamming distance, d, for a code, and we will often talk
about (n, k, d) codes (not just (n, k) codes).

For 0 -> 000; 1 -> 111, d = 3.  For 0 -> 000, 1 -> 100, d = 1.  (Those
codewords are not far apart in space).

----------------------------------------------------------------------
Minimum Hamming distance and error detection/correction properties
----------------------------------------------------------------------

Knowing the minimum hamming distance of our code lets us make
statements about how many bit errors can be detected or corrected:

1. We can detect all patterns of up to t bit errors iff d >= t+1

For my code above, this means we can detect any 2-bit error.

2. We can correct all patterns of up to t bit errors iff d >= 2t+1

For my code above, this means we can detect any 1-bit error.

Now note: We can't simultaneously detect 2-bit errors and correct
1-bit errors, because we would (wrongly) "correct" a 2-bit error from
one codeword as though it was a 1-bit error from the other codeword,
and there would be no detection of the 2-bit error.

This leads us to:

3. We can detect all patterns of up to t_D bit errors *while
   simultaneously correcting* all patterns of up to t_C bit errors
   (t_C <= t_D) iff d >= t_C + t_D + 1

For my code above, this means we can detect any 1-bit error and at the
same time correct any 1-bit error.  But we cannot detect any 2-bit
error while also correcting 1-bit errors.

We desire good error detection and correction properties.  Our goal
for today is to create codes that allow us to have a large value for
d; the larger d is, the more errors we can detect and correct.  As we
go on today, we will what tradeoffs come with using a large value of
d.

======================================================================
Linear Block codes
======================================================================

All of the codes we discuss in this class will be linear codes.  These
codes are valued for their power and efficiency, and they also give
some nice structure to our codes, which makes them easier to reason
about.  The "linear" in "linear block code" means that we're going to
obtain the codewords by doing a linear transformation of the message
bits (basically: any additional bits will be a sum in GF2 of some set
of the message bits).

Key property: the sum of any two codewords is also a codeword.  This
is in fact necessary and sufficient for a code to be linear.

This leads to two additional properties:
1. The codeword made of all zeroes is always a codeword
2. We can easily calculate the minimum HD for such a code: it's the
   smallest weight (number of 1's) among nonzero codewords.

To explain linear codes in detail, we'll use George's example from
last week:

----------------------------------------------------------------------
Parity Check
----------------------------------------------------------------------

George gave us a simple parity code last week: Add a parity bit P to a
message of k data bits to make the number of 1 bits even.

Example:
  0000 -> 00000
  0010 -> 00101
  1010 -> 10100

(There are more codewords; I won't write them all)

----------------------------------------------------------------------
Systematic Codes
----------------------------------------------------------------------

This code is already in what we call "systematic form": The codewords
consist of k data bits, with n-k bits added to the end (to form
codewords of n bits total):

                        k bits         n-k bits
                    --------------  -------------
                   [ message bits ][ parity bits ]
                    -----------------------------
                                n bits

Every linear code can be represented by an equivalent systematic form.
Note that this corresponds to the same n and k we had before; these
codes will result in (n,k,d) codes, for some d.

** For the parity code, what is k?  What is n?  (k=4, n=5)

----------------------------------------------------------------------
Generator Matrix
----------------------------------------------------------------------

Because these codes deal with linear transformations, we can describe
the operation of generating the codewords with matrices.  This will
become very useful to us in Wednesday's lecture.

c = codeword (an n-element row vector)
d = data/message (a k-element row vector)

I claim that there exists kxn matrix G (the "generator matrix"), such
that c=DG.

For example, in the parity code, we have D = 0100 and c = 01001.
We're looking for G such that:

    D        G       =      c
[0 1 0 0][_ _ _ _ _] = [0 1 0 0 1]
         [_ _ _ _ _]
         [_ _ _ _ _]
         [_ _ _ _ _]

c_j is a linear combination of the message bits in d, weighted by the
corresponding entries in the j^th column of G.

Because our codes are in systematic form, the first k bits of c need
to equal the first k bits of D:

    D        G       =      c
[0 1 0 0][1 0 0 0 _] = [0 1 0 0 1]
         [0 1 0 0 _]
         [0 0 1 0 _]
         [0 0 0 1 _]

This means that G = [I | A]; our new problem is to determine A.  What
is A for the parity check code?  We're adding up all of the bits of D
in GF2 to determine the final bit of c (i.e., we're not adding up only
some of the bits).  So:

    D        G       =      c
[0 1 0 0][1 0 0 0 1] = [0 1 0 0 1]
         [0 1 0 0 1]
         [0 0 1 0 1]
         [0 0 0 1 1]

----------------------------------------------------------------------
Performance of Parity Check Codes
----------------------------------------------------------------------

Now that we've put our parity check code into this fancy new form,
let's see how well it works.

What is d?  2.  Two messages that differ by 1 bit will have their
codewords differ by 2 bits, and any messages that differ by more than
1 bit will necessarily have their codewords differ by at least 2 bits.
This means we can detect all single-bit errors, but cannot correct
errors.

This is disappointing; all that work and we can't even correct any
errors.

======================================================================
Rectangular Codes
======================================================================

Let's get a little bit fancier with our parity bits.

D1 D2 | P1   <-- parity bit for row 1
D3 D4 | P2   <-- parity bit for row 2
------
P3 P4        <-- parity bits for col 1 and 2, respectively

This is a "rectangular code".  We put the data bits into a rectangle,
and added parity bits for each row and each column.

** What is k? 4 (4 data bits) What is n? 8 (4 data bits + 4 parity
   bits) What is d?  We'll find out soon.

Let's see if we can correct any errors:

Example 1:

    0 1 | 1      All parity bits check: no errors
    1 1 | 0
    -----
    1 0

Example 2:

    0 1 | 1      Parity bit for row 2 is incorrect; something must
    1 0 | 0      have gone wrong.  And the parity bit for column 2
    -----        is incorrect.  So we know which bit was in error!
    1 0          (D4)

Example 3:

    0 1 | 1      Parity bit for row 2 is incorrect, but all parity
    1 1 | 1      bits for the columns are correct.  So P2 is incorrect
    -----
    1 0

We have been able to correct some single-bit errors.  What is d for
this code?  I claim d = 3.

  Pf: if two messages have hamming distance 1, then some data bit d_i
  is different. Their codewords will differ in three places: d_i, the
  parity bit for d_i's row, and the parity bit for d_i's column.  So
  the hamming distance of the codewords is 3.

  E.g.:  0 0 | 0      0 1 | 1
         0 0 | 0      0 0 | 0
         -----        -----
         0 0          0 1

  if two messages have hamming distance 2, then d_i and d_j are
  different.  Their codewords will differ in d_i, d_j, and either two
  rows or two columns (or possibly both).  So d >= 4

  E.g.:  0 0 | 0      1 1 | 0
         0 0 | 0      0 0 | 0
         -----        -----
         0 0          1 1

  if two messages have hamming distance 3 or more, we'll also find
  that d >= 4.

Since d = 3, we can correct all single-bit errors.  This makes this
code an "SEC" code: single error correction code.  Hooray!  That's
what we were looking for.

What is the rate for this code?  For this example, n=8, k=4.  What
about in general, with r rows and c columns of data?

  number of data bits = r * c
  number of parity bits = r + c => total number of bits = rc + r + c

  So rate is rc / (rc + r + c)

----------------------------------------------------------------------
Systematic form
----------------------------------------------------------------------

Before we think more about this code, let's put it in systematic form.
Our code blocks would be

[d1 d2 d3 d4][p1 p2 p3 p4]

What is G for this code?  The first parity bit uses, or "covers", d1
and d2, the second covers d3 and d4, the third covers d1 and d3, and
the fourth covers d2 and d4.  So:

[d1 d2 d3 d4][1 0 0 0 1 0 1 0]
             [0 1 0 0 1 0 0 1]
             [0 0 1 0 0 1 1 0]
             [0 0 0 1 0 1 0 1]

----------------------------------------------------------------------
Overhead
----------------------------------------------------------------------

So we have gotten better: we went from being able to correct no
errors, to being able to correct single-bit errors.  Let's think about
what price we paid for doing this.

If we had just sent our data bits, we would've sent 4 bits per block
for this code.  Now, we are sending 8 bits.  That's a lot of extra
bits; our message is twice as long!

Do we really need so many?  Let's see if we can get some sort of
bounds on n, to give us guidance.

For SEC, our parity bits need to have at least n+1 distinct values:
- The case where there are no errors
- Error in the i^th bit of the n-bit codeword

  Aside: this calculation is going to be different if we want to
  correct more than single-bit errors; we'll need to deal with more
  than just n+1 combinations.

We have n-k parity bits, so we can represent 2^{n-k} possibilities.

This means that we need: n+1 <= 2^{n-k}  ===>  n <= 2^{n-k} - 1

For our rectangular code example, n=8, k=4: 8 <= 2^4 - 1 = 15.  This
is a pretty large gap.

Is it possible to use fewer than 4 parity bits, for 4 data bits?  For
instance, what if we tried to only use 2 parity bits?

    n <= 2^{n-k}-1 ===> 6 <= 2^2 - 1 ===> 6 <= 3

That's not going to work.  What about 3 parity bits?

    n <= 2^{n-k}-1 ===> 7 <= 2^3-1 ===> 7 <= 7

Well!  This is evidence that we should be able to do SEC correction
with fewer parity bits.

======================================================================
Example 3: Hamming Codes
======================================================================

Hamming codes correct single errors with the minimum number of parity
bits:

     n = 2^{n-k} - 1

This will give us codes with rates like: (7, 4, 3), (15, 11, 3),
.. (2^m - 1, 2^m -1 - m, 3)

For us, right now, this is great because of efficiency.  But that's
not the only criterion for a code: the ability of a code's k/n to
approach channel capacity, and various other factors, are also
important.

For this last bit, we will focus on setting up Hamming codes.
Wednesday we will learn how to encode and decode them efficiently.

----------------------------------------------------------------------
Parity bits for (7,4,3)
----------------------------------------------------------------------

In general, hamming codes use the minimum number of parity bits, each
covering a subset of the data bits.  No two message bits belong to
exactly the same subsets, so a single-bit error will generate a unique
set of parity check errors.

We'll start out with an example: the (7,4,3) hamming code.

P1 = D1 + D2 + D4
P2 = D1 + D3 + D4
P3 = D2 + D3 + D4

Q. What if we check, and P1 and P3 indicate an error?
   - Something is wrong in P1's bits
   - Something is wrong in P3's bits
   - Nothing is wrong in P2's bits
   => D2 was flipped.

Q. What if we check, and P2 indicates an error?
   - Nothing is wrong in P1's bits
   - Something is wrong in P2's bits
   - Nothing is wrong in P3's bits
   => P2 itself was flipped.  If it had been one of the data bits,
      then one of the other parity bits also would've been incorrect.

Recalling back to our rectangular codes, there we had:

P1 = D1 + D2
P2 = D3 + D4
P3 = D1 + D3
P4 = D2 + D4

We were doing the same type of operations, but we didn't set the
parity bits up as efficiently.

----------------------------------------------------------------------
Parity bits in general
----------------------------------------------------------------------

The way we set up the parity bits might seem arbitrary; if this were
the case, Hamming codes would be difficult to generalize past the
(7,4,3) case.  Luckily, there is structure to follow:

          Bit position:   1   2   3   4   5   6   7
Bit position in binary: 001 010 011 100 101 110 111

All bit positions that are powers of two are going to correspond to
parity bits.

          Bit position:   1   2   3   4   5   6   7
Bit position in binary: 001 010 011 100 101 110 111
     Encoded data bits:  p1  p2      p4

With whatever is left, we fill in data bits

          Bit position:   1   2   3   4   5   6   7
Bit position in binary: 001 010 011 100 101 110 111
     Encoded data bits:  p1  p2  d1  p4  d2  d3  d4

p1 covers all bits that have the least significant bit set

          Bit position:   1   2   3   4   5   6   7
Bit position in binary: 001 010 011 100 101 110 111
     Encoded data bits:  p1  p2  d1  p3  d2  d3  d4
----------------------------------------------------
         covered by p1:  X       X       X       X

p2 covers all bits that have the second-least significant bit set

          Bit position:   1   2   3   4   5   6   7
Bit position in binary: 001 010 011 100 101 110 111
     Encoded data bits:  p1  p2  d1  p3  d2  d3  d4
----------------------------------------------------
         covered by p1:  X       X       X       X
         covered by p2:      X   X           X   X

p3 covers all bits that have the third-least significant bit set

          Bit position:   1   2   3   4   5   6   7
Bit position in binary: 001 010 011 100 101 110 111
     Encoded data bits:  p1  p2  d1  p3  d2  d3  d4
----------------------------------------------------
         covered by p1:  X       X       X       X
         covered by p2:      X   X           X   X
         covered by p3:              X   X   X   X

So you see: P1 = D1 + D2 + D4
            P2 = D1 + D3 + D4
            P3 = D2 + D3 + D4

This chart gets a little more fun if we take it out to longer Hamming
codes:

Bit position:   1   2   3   4   5   6   7  8  9 10 11 12 13  14  15
Encoded bits:  p1  p2  d1  p4  d2  d3  d4 p8 d5 d6 d7 d8 d9 d10 d11
-------------------------------------------------------------------
covered by p1:  X       X       X       X    X     X     X      X
covered by p2:      X   X           X   X       X  X         X  X
covered by p3:              X   X   X   X              X  X  X  X
covered by p4:                             X X  X  X   X  X  X  X

What you should be asking yourself now is:
- How can we do this encoding with matrix operations, i.e., what is
  "G" for this code?
- How do we do the decoding efficiently?

======================================================================
Summary
======================================================================

Today:
- We aim to choose codes with a particular minimum hamming distance d,
  because d tells us how many errors we can detect and correct.
- We use linear codes for their power and efficiency (note: you haven't
  quite seen the efficiency of linear codes yet)
- Simple parity check has d=1, which gives us no SEC.  Rectangular
  codes have d=3, which lets us do SEC, but is wasteful
- Hamming codes are the most efficient (in terms of overhead) for
  doing SEC