6.02 - Syndrome Decoding Lecture 5 Katrina LaCurts, lacurts@mit.edu ====================================================================== Recap from last week ====================================================================== We left off last week with Hamming Codes, which give us SEC codes using the fewest parity bit possible. We can prove that they use the fewest, because they satisfy n = 2^{n-k} - 1, and we proved on Monday that n <= 2^{n-k} - 1 is necessary for SEC. This bound -- or at least, this style of bound -- will remain important for us today. We are going to talk in general about how to efficiently encode and decode linear block codes. I am going to start off with the specific example of how to encode/decode the (7,4,3) Hamming code. ====================================================================== Decoding (7,4,3) the tough way ====================================================================== Recall the (7,4,3) Hamming code: P1 = D1 + D2 + D4 P2 = D1 + D3 + D4 P3 = D2 + D3 + D4 The transmitter is going to receive codewords in systematic form. So: [data][parity] But remember that errors can be introduced. So instead of talking about that received codeword as: [D1 D2 D3 D4][P1 P2 P3] We'll use: [D1' D2' D3' D4'][P1' P2' P3'] (i.e., the received value D1' may differ from the sent value D1 due to errors) For the (7,4,3) Hamming code, here are all the possible codewords we could send: 0000|000 0100|101 1000|110 1100|011 0001|111 0101|010 1001|001 1101|100 0010|011 0110|110 1010|101 1110|000 0011|100 0111|001 1011|010 1111|111 Decoding approach #1: 1. Calculate hamming distance between D1'D2'D3'D4'P1'P2'P3' and every single possible codeword. 2. Choose the codeword with the minimum value, and decode as such. Example: If you receive 1001|100, 1101|100 has the minimum Hamming distance to that. If you receive 0111|011, 0111|001 has the minimum Hamming distance. In fact, since we're only doing SEC, we could be even more clever: compare the received codeword to each possible codeword and stop when we find one that is either 0 or 1 away. Now. This is a perfectly valid decoding procedure. It will work. But remember that our Hamming codes had such nice structure. They're linear! They have a generating matrix! The parity bits are actually *linear functions* of the data bits, not just arbitrary choices! Plus, we're doing up to 2^k comparisons of n-bit words here. We are going to exploit that structure for a more efficient decoding procedure, called "syndrome decoding". Syndrome decoding is an efficient way to decode any linear block code (and we'll see that later today); right now, let's see how it works for our (7,4,3) code. ====================================================================== Decoding (7,4,3): Intuition for Syndrome decoding. ====================================================================== Here's the intuition behind syndrome decoding. If there are no bit errors, then in our received codeword, the following should be true: P1' = D1' + D2' + D4' P2' = D1' + D3' + D4' P3' = D2' + D3' + D4' So we could do a check like: "if D1' + D2' + D4' == P1' and D1' + D3' + D4' == P2' and D2' + D3' + D4' == P3' then no errors". However, in the case of errors, we'd have to do a little bit of work. We'd do things like: if D1' + D2' + D4' != P1' and D1' + D3' + D4' != P2' and D2' + D3' + D4' == P3' then D1' is in error This is better than what we were doing before (computing the Hamming distance to every codeword), but it's still not as good as what we could do. Instead, I'm going to calculate three "syndrome" bits: S1 = D1' + D2' + D4' + P1' S2 = D1' + D3' + D4' + P2' S3 = D2' + D3' + D4' + P3' If there are no bit errors, what should these three bits be? Zero! Having the syndrome bits equal to zero is equivalent to having P1' = D1' + D2' + D4', etc. Things get exciting when there are errors. If D1' has an error, what will these syndrome bits be? S1 = 1, S2 = 1, S3 = 0. So we could do: "if S1S23 == 110 then D1' is in error" In fact, for each possible error, the syndrome bits will have a unique combination: S1S2S3 | | S3S2S1 000 | no errors | 000 100 | P1' in error | 001 010 | P2' in error | 010 110 | D1' in error | 011 001 | P3' in error | 100 101 | D2' in error | 101 011 | D3' in error | 110 111 | D4' in error | 111 Two facts should immediately come to mind: 1. When we proved the bound for the number of parity bits, we said that *for Hamming codes* the parity bits needed at least n+1 combinations to recognize all possible single-bit errors (and the no error case). You're seeing that in action here. Think about what this table would look like if we were going to use a linear code that corrected more than single bit errors. 2. Remember how I constructed Hamming codes? The binary index for a particular codeword bit is related to the syndrome bits (it's equal to S3S2S1) Explanation: For instance, D1's binary index is 011. That means it was covered by P2 and P1. If it's in error, P3 will be correct -- D1 isn't covered by that -- but P1 and P2 will be incorrect. This results in our syndrome bits being 011. You already know that the (7,4,3) code can't correct 2-bit errors, but let's see how Syndrome decoding will work here. Say D2' and D3' are in error. The syndrome bits will work out to 110 (P1 doesn't check out, P2 doesn't check out, and P3 doesn't check out in *two* positions, which cancel out). This is interpreted as an error in D1'. So we will wrongly "correct" D1', and leave D2' and D3' alone. ====================================================================== Syndrome Decoding ====================================================================== The way we will implement this type of logic is with matrices. Remember, there exists a generator matrix G such that dG = c; this is our encoding process. We saw on Monday that G = [I | A]. Ex: For the (7,4,3) code: G = [1 0 0 0 | 1 1 0] [0 1 0 0 | 1 0 1] [0 0 1 0 | 0 1 1] [0 0 0 1 | 1 1 1] G is used to encode, i.e., it describes the operations to take messages to codewords (as you can see from the equation dG = c). To decode, we need another matrix to help with the reverse process. We're going to call this matrix H, and you will see shortly that it's related to G, though they're not equal. What we are going to find is H, such that Hr^T = S, where S contains the syndrome bits as above. Thus, in the case of no errors, Hr^T = all zeros. Ex: In the (7,4,3) code, that means if there is an error in D1, H x rT = [1 1 0]. ---------------------------------------------------------------------- Finding H ---------------------------------------------------------------------- Before I start this explanation, know that determining H is actually very easy, especially if you've already determined G. But to explain H to you, I'm going to build it up from first principles. So what I'm about to do is for pedagogy; it's not something you need to do every single time you determine H. With the (7,4,3) code, we want H such that H*r^T = [S1 S2 S3]^T, i.e.,: [_ _ _ _ _ _ _][d1'] = [S1] [_ _ _ _ _ _ _][d2'] [S2] [_ _ _ _ _ _ _][d3'] [S3] [d4'] [p1'] [p2'] [p3'] S1 = D1' + D2' + D4' + P1'. So: [1 1 0 1 1 0 0][d1'] = [S1] [_ _ _ _ _ _ _][d2'] [S2] [_ _ _ _ _ _ _][d3'] [S3] [d4'] [p1'] [p2'] [p3'] S2 = D1' + D3' + D4' + P2', and S3 = D2' + D3' + D4' + P3', so: [1 1 0 1 1 0 0][d1'] = [S1] [1 0 1 1 0 1 0][d2'] [S2] [0 1 1 1 0 0 1][d3'] [S3] [d4'] [p1'] [p2'] [p3'] If you recall G = [I | A], you'll see that H = [A^T | I], where I is (n-k)x(n-k) this time. This is not a coincidence. We designed A such that the ij^{th} entry in A specifies whether data bit D_i was used in constructing parity bit P_j. ====================================================================== Logic behind syndrome decoding ====================================================================== One question you should have now is, in general, how do I compute the syndromes? E.g., we figured out the syndromes for the (7,4,3) code, but how do they work for linear block codes in general? To figure this out, let's make syndrome decoding a bit more formal. ---------------------------------------------------------------------- Formal statement ---------------------------------------------------------------------- Any received codeword r can be interpreted as: r = c + e where c is an actual codeword, and e is an error vector (indicating which bits, if any, flipped). With our matrix H (=[A^T | I]) we compute the syndromes on the received word: H r^T = S and use S to figure out which bit is in error. To figure out the relationships between the syndromes to errors in general: H r^T = H (c + e)^T = Hc^T + He^T = S Since Hc^T = 0, we have He^T = S. (For those of you into linear algebra, what we're doing formally is checking if the received codeword is in the nullspace of H (if Hr^T = 0)). What does this mean for us? Knowing the error patterns we want to correct for, we can compute the syndrome vectors offline, and then do the lookup after the syndrome is calculated from a received word to find the error type that occurred. Ex: For SEC, the error vectors are: [1 0 0 0 0 0 0], [0 1 0 0 0 0 0], [0 0 1 0 0 0 0], [0 0 0 1 0 0 0], [0 0 0 0 1 0 0], [0 0 0 0 0 1 0], [0 0 0 0 0 0 1] Note: if the parity bits are in error, we don't really care. So in practice, for SEC, we only need to calculate k+1 of the n+1 syndromes (one per data-bit error). ** How will this change if we want to correct multiple bit errors? ---------------------------------------------------------------------- Algorithm ---------------------------------------------------------------------- This brings us to the formal algorithm for syndrome decoding: 1. For error patterns {E_i}, precompute the syndromes HE_i = S_i, and store them in a "syndrome table" Note: for error patterns with just a single 1 in the ith position, this is trivial: HE_i is the i^{th} column of H. 2. For each received word, compute Hr = S 3. Find l such that S_l = S, and correct the error: c = R + e_l 4. Decode the codeword (for systematic codes: take the first k bits) So for our (7,4,3) example: H = [1 1 0 1 1 0 0] [1 0 1 1 0 1 0] [0 1 1 1 0 0 1] Syndrome table: E_i | S_i 1000000 | 110 0100000 | 101 0010000 | 011 0001000 | 111 Suppose you receive [0 1 1 0 | 1 0 1] [1 1 0 1 1 0 0][0] = [0] [1 0 1 1 0 1 0][1] [1] [0 1 1 1 0 0 1][1] [1] [0] [1] [0] [1] [011] corresponds to E_3 = [0010000], so: [0110101] + [001000] = [0100101] => [0100] Now, is this actually more efficient than our Hamming-distance-based approach? There, we were doing 2^k comparisons of n-bit words (2^k/2 on average, if you're being clever and stop when you find a hamming distance of 0 or 1). Now we're doing k comparisons of (n-k) bit words (for SEC: for correcting more errors, we'd do more). ---------------------------------------------------------------------- Additional Facts ---------------------------------------------------------------------- Think about the matrix A. If A had two identical rows, what would that mean? It would mean that two data bits were covered by the same set of parity bits (if row i and j are identical, then d_i and d_j are covered by the same parity bits). This will prevent us from correcting any errors. Here's why: 1. Claim: The minimum Hamming distance of the linear code is the minimum number of columns of H that are linearly dependent, i.e., that can be combined to give the zero vector. (Remember: syndrome decoding works in general, not just for Hamming codes, so d is not necessarily 3) Why? We know that d = the smallest weight among nonzero codewords; let's call that codeword w. We also know that Hw^T = 0. In this multiplication, the location of the 1's in any received codeword tell us what columns of H to combine. In this case, we're combining d columns, and they equal 0. So the minimum number of linearly dependent columns is no *more* than d. Can it be less? Say that l of the columns are linearly dependent, with 0 < l < d. Let's call the positions of those columns p_1, .., p_l. Then multiply H by the vector with 1's in positions p_1, .., p_l, and zeros everywhere else. H times that vector is zero, because those columns are linearly dependent. Hence that vector is a valid codeword, and has weight < d. Contradiction! 2. That's nice, but we already knew d. What more does this buy us? Well, if A has two identical rows, then A^T has two identical columns, so d is no greater than 2, so error correction is not possible. This claim also lets us figure out error correcting properties without calculating all the codewords. For instance, consider an (8, 4) code, where I take the (7,4,3) code and add an overall parity bit (Note: this parity bit should be calculated over *all* of the bits, i.e., P4 = D1 + D2 + D3 + D4 + P1 + P2 + P3. Re-writing that in terms of just the data bits, you get P4 = D1 + D2 + D3). Then: G = [1 0 0 0 | 1 1 0 1] H = [1 1 0 1 | 1 0 0 0] [0 1 0 0 | 1 0 1 1] => [1 0 1 1 | 0 1 0 0] [0 0 1 0 | 0 1 1 1] [0 1 1 1 | 0 0 1 0] [0 0 0 1 | 1 1 1 0] [1 1 1 0 | 0 0 0 1] The smallest number of linearly dependent columns in H is 4 (my intuition: you can pick 4 vectors that all have zeros in the same dimension; you can't pick 3 vectors that all have zeros in the same two dimensions). So d=4 for this code, allowing us to detect 3-bit errors, but to still only correct 1-bit errors. Although, note that we could correct 1-bit errors *while simultaneously* detecting up to 2-bit errors (i.e., we won't "wrongly correct" any 2-bit errors; instead, we will recognize them as such). ====================================================================== Multiple Errors ====================================================================== Let's briefly consider doing more than single-error correction, and how this impacts Syndrome decoding. ** What needs to increase if we want to correct more than single-bit errors? d, and so, one imagines, the number of parity bits will need to increase. To correct 1-bit errors, we already know n + 1 <= 2^{n-k}. That's really: [n choose 1] + [n choose 0] <= 2^{n-k} To correct 2-bit errors: [n choose 2] + [n choose 1] + [n choose 0] <= 2^{n-k} To correct t bit errors: [n choose t] + [n choose t-1] + .. + [n choose 0] <= 2^{n-k} This makes sense: to correct more than 1-bit errors, we'd have to have more error vectors in our syndrome table, and this is exactly how we would construct them. ====================================================================== Burst errors ====================================================================== Before we finish linear block codes, let me give you one quick trick. We've spent a lot of time on linear block codes that correct single-bit errors. We can extend all of these ideas to correct independent multiple-bit errors. Both of these types of errors are captured by the BSC model, which assumes that each bit is flipped independently with probability p. But the real world is not a BSC! Many real-world errors occur in bursts; this results in correlated multi-bit errors (e.g., fading or a burst of interference on a wireless channel (like a microwave turning on), damage to storage media, etc.) This means we'll have: [dddddddd][pppp][ddxxxxxx][xxx[][dddddddd][pppp] So the burst of errors caused multiple errors per block; probably more than we can correct. Solution: Interleave codewords! A burst of errors is now spread across multiple blocks. We have to wait a longer time before we can decode a particular block, but we can correct many more errors. ====================================================================== Summary ====================================================================== Today: we saw syndrome decoding, which is an efficient way to decode linear block codes (NOT just Hamming codes). Between this week and last week, we've learned linear block codes 1. (n,k,d) codes have rate k/n, can detect up to d-1 errors, and correct up to floor(d-1/2) errors. 2. Parity bits are linear combination of the message bits - Parity check code: (n+1, n, 2). Good rate, can't correct errors - Replication: (c,1,floor(c/2)). Great error correction; terrible rate. - Hamming codes: (2^m-1, 2^m-1-m, 3). Fewer parity bits than rectangular codes, same EC properties 3. Syndrome decoding: General efficient approach for linear block codes. 4. Interleaving can help us adapt these techniques to burst errors (which aren't modeled in our BSC) Question for you: Think about the rates that we got for linear block codes. Can you think of ways to improve the code rates while still correcting errors (e.g., parity check codes have a great rate, but we can't make corrections).