6.02 - Syndrome Decoding
Lecture 5
Katrina LaCurts, lacurts@mit.edu

======================================================================
Recap from last week
======================================================================

We left off last week with Hamming Codes, which give us SEC codes
using the fewest parity bit possible.  We can prove that they use the
fewest, because they satisfy n = 2^{n-k} - 1, and we proved on Monday
that n <= 2^{n-k} - 1 is necessary for SEC.

This bound -- or at least, this style of bound -- will remain
important for us today.  We are going to talk in general about how to
efficiently encode and decode linear block codes.  I am going to start
off with the specific example of how to encode/decode the (7,4,3)
Hamming code.

======================================================================
Decoding (7,4,3) the tough way
======================================================================

Recall the (7,4,3) Hamming code:
 P1 = D1 + D2 + D4
 P2 = D1 + D3 + D4
 P3 = D2 + D3 + D4

The transmitter is going to receive codewords in systematic form.  So:

[data][parity]

But remember that errors can be introduced.  So instead of talking
about that received codeword as:

[D1 D2 D3 D4][P1 P2 P3]

We'll use: [D1' D2' D3' D4'][P1' P2' P3']  (i.e., the received value
D1' may differ from the sent value D1 due to errors)

For the (7,4,3) Hamming code, here are all the possible codewords we
could send:

0000|000  0100|101  1000|110  1100|011
0001|111  0101|010  1001|001  1101|100
0010|011  0110|110  1010|101  1110|000
0011|100  0111|001  1011|010  1111|111

Decoding approach #1:
1. Calculate hamming distance between D1'D2'D3'D4'P1'P2'P3' and every
   single possible codeword.
2. Choose the codeword with the minimum value, and decode as such.

Example: If you receive 1001|100, 1101|100 has the minimum Hamming
distance to that.  If you receive 0111|011, 0111|001 has the minimum
Hamming distance.

In fact, since we're only doing SEC, we could be even more clever:
compare the received codeword to each possible codeword and stop when
we find one that is either 0 or 1 away.

Now.  This is a perfectly valid decoding procedure.  It will work.
But remember that our Hamming codes had such nice structure.  They're
linear!  They have a generating matrix!  The parity bits are actually
*linear functions* of the data bits, not just arbitrary choices!
Plus, we're doing up to 2^k comparisons of n-bit words here.

We are going to exploit that structure for a more efficient decoding
procedure, called "syndrome decoding".  Syndrome decoding is an
efficient way to decode any linear block code (and we'll see that
later today); right now, let's see how it works for our (7,4,3) code.

======================================================================
Decoding (7,4,3): Intuition for Syndrome decoding.
======================================================================

Here's the intuition behind syndrome decoding.  If there are no bit
errors, then in our received codeword, the following should be true:

  P1' = D1' + D2' + D4'
  P2' = D1' + D3' + D4'
  P3' = D2' + D3' + D4'

So we could do a check like: "if D1' + D2' + D4' == P1' and D1' + D3'
+ D4' == P2' and D2' + D3' + D4' == P3' then no errors".

However, in the case of errors, we'd have to do a little bit of work.
We'd do things like:

  if D1' + D2' + D4' != P1' and D1' + D3' + D4' != P2' and D2' + D3' +
  D4' == P3' then D1' is in error

This is better than what we were doing before (computing the Hamming
distance to every codeword), but it's still not as good as what we
could do.

Instead, I'm going to calculate three "syndrome" bits:

  S1 = D1' + D2' + D4' + P1'
  S2 = D1' + D3' + D4' + P2'
  S3 = D2' + D3' + D4' + P3'

If there are no bit errors, what should these three bits be?  Zero!
Having the syndrome bits equal to zero is equivalent to having P1' =
D1' + D2' + D4', etc.

Things get exciting when there are errors.  If D1' has an error, what
will these syndrome bits be? S1 = 1, S2 = 1, S3 = 0.  So we could do:
"if S1S23 == 110 then D1' is in error"

In fact, for each possible error, the syndrome bits will have a unique
combination:

S1S2S3 |              | S3S2S1
  000  | no errors    |  000
  100  | P1' in error |  001
  010  | P2' in error |  010
  110  | D1' in error |  011
  001  | P3' in error |  100
  101  | D2' in error |  101
  011  | D3' in error |  110
  111  | D4' in error |  111

Two facts should immediately come to mind:

1. When we proved the bound for the number of parity bits, we said
   that *for Hamming codes* the parity bits needed at least n+1
   combinations to recognize all possible single-bit errors (and the
   no error case).  You're seeing that in action here.  Think about
   what this table would look like if we were going to use a linear
   code that corrected more than single bit errors.

2. Remember how I constructed Hamming codes?  The binary index for a
   particular codeword bit is related to the syndrome bits (it's equal
   to S3S2S1)

   Explanation: For instance, D1's binary index is 011.  That means it
   was covered by P2 and P1.  If it's in error, P3 will be correct --
   D1 isn't covered by that -- but P1 and P2 will be incorrect.  This
   results in our syndrome bits being 011.

You already know that the (7,4,3) code can't correct 2-bit errors, but
let's see how Syndrome decoding will work here.

Say D2' and D3' are in error.  The syndrome bits will work out to 110
(P1 doesn't check out, P2 doesn't check out, and P3 doesn't check out
in *two* positions, which cancel out).  This is interpreted as an
error in D1'.  So we will wrongly "correct" D1', and leave D2' and D3'
alone.

======================================================================
Syndrome Decoding
======================================================================

The way we will implement this type of logic is with matrices.
Remember, there exists a generator matrix G such that dG = c; this is
our encoding process.  We saw on Monday that G = [I | A].

   Ex: For the (7,4,3) code: G = [1 0 0 0 | 1 1 0]
                                 [0 1 0 0 | 1 0 1]
                                 [0 0 1 0 | 0 1 1]
                                 [0 0 0 1 | 1 1 1]

G is used to encode, i.e., it describes the operations to take
messages to codewords (as you can see from the equation dG = c).  To
decode, we need another matrix to help with the reverse process.
We're going to call this matrix H, and you will see shortly that it's
related to G, though they're not equal.

What we are going to find is H, such that Hr^T = S, where S
contains the syndrome bits as above.  Thus, in the case of no errors,
Hr^T = all zeros.

  Ex: In the (7,4,3) code, that means if there is an error in D1, H x
  rT = [1 1 0].

----------------------------------------------------------------------
Finding H
----------------------------------------------------------------------

Before I start this explanation, know that determining H is actually
very easy, especially if you've already determined G.  But to explain
H to you, I'm going to build it up from first principles.  So what I'm
about to do is for pedagogy; it's not something you need to do every
single time you determine H.

With the (7,4,3) code, we want H such that H*r^T = [S1 S2 S3]^T, i.e.,:

[_ _ _ _ _ _ _][d1'] = [S1]
[_ _ _ _ _ _ _][d2']   [S2]
[_ _ _ _ _ _ _][d3']   [S3]
               [d4']
               [p1']
               [p2']
               [p3']

S1 = D1' + D2' + D4' + P1'.  So:

[1 1 0 1 1 0 0][d1'] = [S1]
[_ _ _ _ _ _ _][d2']   [S2]
[_ _ _ _ _ _ _][d3']   [S3]
               [d4']
               [p1']
               [p2']
               [p3']

S2 = D1' + D3' + D4' + P2', and S3 = D2' + D3' + D4' + P3', so:

[1 1 0 1 1 0 0][d1'] = [S1]
[1 0 1 1 0 1 0][d2']   [S2]
[0 1 1 1 0 0 1][d3']   [S3]
               [d4']
               [p1']
               [p2']
               [p3']

If you recall G = [I | A], you'll see that H = [A^T | I], where I is
(n-k)x(n-k) this time.

This is not a coincidence.  We designed A such that the ij^{th} entry
in A specifies whether data bit D_i was used in constructing parity
bit P_j.

======================================================================
Logic behind syndrome decoding
======================================================================

One question you should have now is, in general, how do I compute the
syndromes?  E.g., we figured out the syndromes for the (7,4,3) code,
but how do they work for linear block codes in general?

To figure this out, let's make syndrome decoding a bit more formal.

----------------------------------------------------------------------
Formal statement
----------------------------------------------------------------------

Any received codeword r can be interpreted as:

       r = c + e

where c is an actual codeword, and e is an error vector (indicating
which bits, if any, flipped).

With our matrix H (=[A^T | I]) we compute the syndromes on the
received word:

      H r^T = S

and use S to figure out which bit is in error.  To figure out the
relationships between the syndromes to errors in general:

      H r^T = H (c + e)^T = Hc^T + He^T = S

Since Hc^T = 0, we have He^T = S.

(For those of you into linear algebra, what we're doing formally is
checking if the received codeword is in the nullspace of H (if Hr^T =
0)).

What does this mean for us?  Knowing the error patterns we want to
correct for, we can compute the syndrome vectors offline, and then do
the lookup after the syndrome is calculated from a received word to
find the error type that occurred.

      Ex: For SEC, the error vectors are:
         [1 0 0 0 0 0 0], [0 1 0 0 0 0 0], [0 0 1 0 0 0 0], [0 0 0 1 0
         0 0], [0 0 0 0 1 0 0], [0 0 0 0 0 1 0], [0 0 0 0 0 0 1]

Note: if the parity bits are in error, we don't really care.  So in
practice, for SEC, we only need to calculate k+1 of the n+1 syndromes
(one per data-bit error).

** How will this change if we want to correct multiple bit errors?

----------------------------------------------------------------------
Algorithm
----------------------------------------------------------------------

This brings us to the formal algorithm for syndrome decoding:

1. For error patterns {E_i}, precompute the syndromes HE_i = S_i, and
   store them in a "syndrome table"

   Note: for error patterns with just a single 1 in the ith position,
   this is trivial: HE_i is the i^{th} column of H.

2. For each received word, compute Hr = S

3. Find l such that S_l = S, and correct the error: c = R + e_l

4. Decode the codeword (for systematic codes: take the first k bits)

So for our (7,4,3) example:

H = [1 1 0 1 1 0 0]
    [1 0 1 1 0 1 0]
    [0 1 1 1 0 0 1]

Syndrome table:      E_i | S_i
                 1000000 | 110
                 0100000 | 101
                 0010000 | 011
                 0001000 | 111

Suppose you receive [0 1 1 0 | 1 0 1]

[1 1 0 1 1 0 0][0] = [0]
[1 0 1 1 0 1 0][1]   [1]
[0 1 1 1 0 0 1][1]   [1]
               [0]
               [1]
               [0]
               [1]

[011] corresponds to E_3 = [0010000], so:

         [0110101] + [001000] = [0100101] => [0100]

Now, is this actually more efficient than our Hamming-distance-based
approach?  There, we were doing 2^k comparisons of n-bit words (2^k/2
on average, if you're being clever and stop when you find a hamming
distance of 0 or 1).  Now we're doing k comparisons of (n-k) bit
words (for SEC: for correcting more errors, we'd do more).

----------------------------------------------------------------------
Additional Facts
----------------------------------------------------------------------

Think about the matrix A.  If A had two identical rows, what would
that mean?  It would mean that two data bits were covered by the same
set of parity bits (if row i and j are identical, then d_i and d_j are
covered by the same parity bits).

This will prevent us from correcting any errors.  Here's why:

1. Claim: The minimum Hamming distance of the linear code is the
   minimum number of columns of H that are linearly dependent, i.e.,
   that can be combined to give the zero vector.  (Remember: syndrome
   decoding works in general, not just for Hamming codes, so d is not
   necessarily 3)

   Why?  We know that d = the smallest weight among nonzero codewords;
   let's call that codeword w.  We also know that Hw^T = 0.  In this
   multiplication, the location of the 1's in any received codeword
   tell us what columns of H to combine.  In this case, we're
   combining d columns, and they equal 0.  So the minimum number of
   linearly dependent columns is no *more* than d.

   Can it be less?  Say that l of the columns are linearly dependent,
   with 0 < l < d.  Let's call the positions of those columns p_1, ..,
   p_l.  Then multiply H by the vector with 1's in positions p_1, ..,
   p_l, and zeros everywhere else.  H times that vector is zero,
   because those columns are linearly dependent.  Hence that vector is
   a valid codeword, and has weight < d.  Contradiction!

2. That's nice, but we already knew d.  What more does this buy us?
   Well, if A has two identical rows, then A^T has two identical
   columns, so d is no greater than 2, so error correction is not
   possible.

This claim also lets us figure out error correcting properties without
calculating all the codewords.  For instance, consider an (8, 4) code,
where I take the (7,4,3) code and add an overall parity bit (Note:
this parity bit should be calculated over *all* of the bits, i.e., P4
= D1 + D2 + D3 + D4 + P1 + P2 + P3.  Re-writing that in terms of just
the data bits, you get P4 = D1 + D2 + D3).  Then:

G = [1 0 0 0 | 1 1 0 1]     H = [1 1 0 1 | 1 0 0 0]
    [0 1 0 0 | 1 0 1 1]  =>     [1 0 1 1 | 0 1 0 0]
    [0 0 1 0 | 0 1 1 1]         [0 1 1 1 | 0 0 1 0]
    [0 0 0 1 | 1 1 1 0]         [1 1 1 0 | 0 0 0 1]

The smallest number of linearly dependent columns in H is 4 (my
intuition: you can pick 4 vectors that all have zeros in the same
dimension; you can't pick 3 vectors that all have zeros in the same
two dimensions).

So d=4 for this code, allowing us to detect 3-bit errors, but to still
only correct 1-bit errors.  Although, note that we could correct 1-bit
errors *while simultaneously* detecting up to 2-bit errors (i.e., we
won't "wrongly correct" any 2-bit errors; instead, we will recognize
them as such).

======================================================================
Multiple Errors
======================================================================

Let's briefly consider doing more than single-error correction, and
how this impacts Syndrome decoding.

** What needs to increase if we want to correct more than single-bit
   errors?  d, and so, one imagines, the number of parity bits will
   need to increase.

To correct 1-bit errors, we already know n + 1 <= 2^{n-k}.  That's
really: [n choose 1] + [n choose 0] <= 2^{n-k}

To correct 2-bit errors:
  [n choose 2] + [n choose 1] + [n choose 0] <= 2^{n-k}
To correct t bit errors:
  [n choose t] + [n choose t-1] + .. + [n choose 0] <= 2^{n-k}

This makes sense: to correct more than 1-bit errors, we'd have to have
more error vectors in our syndrome table, and this is exactly how we
would construct them.

======================================================================
Burst errors
======================================================================

Before we finish linear block codes, let me give you one quick trick.

We've spent a lot of time on linear block codes that correct
single-bit errors.  We can extend all of these ideas to correct
independent multiple-bit errors.  Both of these types of errors are
captured by the BSC model, which assumes that each bit is flipped
independently with probability p.

But the real world is not a BSC!  Many real-world errors occur in
bursts; this results in correlated multi-bit errors (e.g., fading or a
burst of interference on a wireless channel (like a microwave turning
on), damage to storage media, etc.)

This means we'll have:

[dddddddd][pppp][ddxxxxxx][xxx[][dddddddd][pppp]

So the burst of errors caused multiple errors per block; probably more
than we can correct.

Solution: Interleave codewords!  A burst of errors is now spread
across multiple blocks.  We have to wait a longer time before we can
decode a particular block, but we can correct many more errors.

======================================================================
Summary
======================================================================

Today: we saw syndrome decoding, which is an efficient way to decode
linear block codes (NOT just Hamming codes).

Between this week and last week, we've learned linear block codes

1. (n,k,d) codes have rate k/n, can detect up to d-1 errors, and
   correct up to floor(d-1/2) errors.
2. Parity bits are linear combination of the message bits
   - Parity check code: (n+1, n, 2).  Good rate, can't correct errors
   - Replication: (c,1,floor(c/2)).  Great error correction; terrible
     rate.
   - Hamming codes: (2^m-1, 2^m-1-m, 3).  Fewer parity bits than
     rectangular codes, same EC properties
3. Syndrome decoding: General efficient approach for linear block
   codes.
4. Interleaving can help us adapt these techniques to burst errors
   (which aren't modeled in our BSC)

Question for you: Think about the rates that we got for linear block
codes.  Can you think of ways to improve the code rates while still
correcting errors (e.g., parity check codes have a great rate, but we
can't make corrections).