6.02 - Linear Block Codes Lecture 4 Katrina LaCurts, lacurts@mit.edu ====================================================================== Channel Coding ====================================================================== Recap from last week: George mentioned channel codes, which take k message bits and convert them into an n-bit codeword, n > k. We will refer to this as a "block code". Example: 0 -> 000 1 -> 111 [ k-bit message block ] -> channel coder -> [ n-bit codeword ] There are 2^n possible codewords, but only 2^k possible messages. Hence we will not be using all possible codewords. The rate of this code is k/n, and we often refer to them as (n, k) codes. Example: 0 -> 000 this is a (3, 1) code 1 -> 111 These codewords then go through a channel: … [ n-bit codeword ] -> channel -> In our model, the channel is a memoryless binary symmetric channel, where each bit gets flipped with probability p, p < .5. Aside: if we had a channel with p = .5, we wouldn't be able to get information through. If we had a channel with p > .5, we'd just flip all the bits again before decoding them, and then effectively we'd have p < .5. So it's safe to assume in this lecture that p < .5 always. … channel -> [n-bit codeword] -> channel decoder -> [k-bit message block] Suppose a codeword c is received. How do we decode it? One of two things must've happened. 1. c_recv is a valid codeword; we do nothing clever. 2. c_recv is *not* a valid codeword. In that case, we will decode it as whatever the nearest valid neighbor is, where "nearness" is based on Hamming distance. Example: Receive 010. Decode as if it were 000. Will this always be correct? No. Imagine that that received codeword above actually resulted from sending 111 and having two bits flipped. ---------------------------------------------------------------------- Choosing our codewords ---------------------------------------------------------------------- We have 2^n codewords to choose from, and only need to select 2^k. How should we do this? For example, in my k=1 n=3 example, I could've just as easily have done this: 0 -> 000 1 -> 100 Instead of: 0 -> 000 1 -> 111 What's bad about the first? Well, if I send "000", and there is just one bit flip, I'll potentially have another valid codeword ("100"). I'd like to be able to make a statement such as "as long as there are no more than x bit flips, this code will be decoded correctly". I can't make that statement for the first set of codewords (well, technically I can: "as long as there are no more than 0 bit flips, this code will be decoded correctly"). I can for the second (intuitively: I can correct errors from one bit flip) and we'll see why shortly. To choose codewords, we embed for structural separation. Encode the codewords so that they're far from each other in space; likely error patterns shouldn't transform one codeword into another. How much "space" (or volume) we put around each codeword depends on the likely level of noise on the channel. George gave this spatial representation last week: 101-----111 / | / | 100-----110 | | | | | | 001--|--011 | / | / 000-----010 "000" and "111" is the best choice for our code because they are the farthest apart in space, where distance is measured by the hamming distance (I will justify this statement shortly, though you should see intuitively why this is the best). We will be particularly interested in the minimum hamming distance, d, for a code, and we will often talk about (n, k, d) codes (not just (n, k) codes). For 0 -> 000; 1 -> 111, d = 3. For 0 -> 000, 1 -> 100, d = 1. (Those codewords are not far apart in space). ---------------------------------------------------------------------- Minimum Hamming distance and error detection/correction properties ---------------------------------------------------------------------- Knowing the minimum hamming distance of our code lets us make statements about how many bit errors can be detected or corrected: 1. We can detect all patterns of up to t bit errors iff d >= t+1 For my code above, this means we can detect any 2-bit error. 2. We can correct all patterns of up to t bit errors iff d >= 2t+1 For my code above, this means we can detect any 1-bit error. Now note: We can't simultaneously detect 2-bit errors and correct 1-bit errors, because we would (wrongly) "correct" a 2-bit error from one codeword as though it was a 1-bit error from the other codeword, and there would be no detection of the 2-bit error. This leads us to: 3. We can detect all patterns of up to t_D bit errors *while simultaneously correcting* all patterns of up to t_C bit errors (t_C <= t_D) iff d >= t_C + t_D + 1 For my code above, this means we can detect any 1-bit error and at the same time correct any 1-bit error. But we cannot detect any 2-bit error while also correcting 1-bit errors. We desire good error detection and correction properties. Our goal for today is to create codes that allow us to have a large value for d; the larger d is, the more errors we can detect and correct. As we go on today, we will what tradeoffs come with using a large value of d. ====================================================================== Linear Block codes ====================================================================== All of the codes we discuss in this class will be linear codes. These codes are valued for their power and efficiency, and they also give some nice structure to our codes, which makes them easier to reason about. The "linear" in "linear block code" means that we're going to obtain the codewords by doing a linear transformation of the message bits (basically: any additional bits will be a sum in GF2 of some set of the message bits). Key property: the sum of any two codewords is also a codeword. This is in fact necessary and sufficient for a code to be linear. This leads to two additional properties: 1. The codeword made of all zeroes is always a codeword 2. We can easily calculate the minimum HD for such a code: it's the smallest weight (number of 1's) among nonzero codewords. To explain linear codes in detail, we'll use George's example from last week: ---------------------------------------------------------------------- Parity Check ---------------------------------------------------------------------- George gave us a simple parity code last week: Add a parity bit P to a message of k data bits to make the number of 1 bits even. Example: 0000 -> 00000 0010 -> 00101 1010 -> 10100 (There are more codewords; I won't write them all) ---------------------------------------------------------------------- Systematic Codes ---------------------------------------------------------------------- This code is already in what we call "systematic form": The codewords consist of k data bits, with n-k bits added to the end (to form codewords of n bits total): k bits n-k bits -------------- ------------- [ message bits ][ parity bits ] ----------------------------- n bits Every linear code can be represented by an equivalent systematic form. Note that this corresponds to the same n and k we had before; these codes will result in (n,k,d) codes, for some d. ** For the parity code, what is k? What is n? (k=4, n=5) ---------------------------------------------------------------------- Generator Matrix ---------------------------------------------------------------------- Because these codes deal with linear transformations, we can describe the operation of generating the codewords with matrices. This will become very useful to us in Wednesday's lecture. c = codeword (an n-element row vector) d = data/message (a k-element row vector) I claim that there exists kxn matrix G (the "generator matrix"), such that c=DG. For example, in the parity code, we have D = 0100 and c = 01001. We're looking for G such that: D G = c [0 1 0 0][_ _ _ _ _] = [0 1 0 0 1] [_ _ _ _ _] [_ _ _ _ _] [_ _ _ _ _] c_j is a linear combination of the message bits in d, weighted by the corresponding entries in the j^th column of G. Because our codes are in systematic form, the first k bits of c need to equal the first k bits of D: D G = c [0 1 0 0][1 0 0 0 _] = [0 1 0 0 1] [0 1 0 0 _] [0 0 1 0 _] [0 0 0 1 _] This means that G = [I | A]; our new problem is to determine A. What is A for the parity check code? We're adding up all of the bits of D in GF2 to determine the final bit of c (i.e., we're not adding up only some of the bits). So: D G = c [0 1 0 0][1 0 0 0 1] = [0 1 0 0 1] [0 1 0 0 1] [0 0 1 0 1] [0 0 0 1 1] ---------------------------------------------------------------------- Performance of Parity Check Codes ---------------------------------------------------------------------- Now that we've put our parity check code into this fancy new form, let's see how well it works. What is d? 2. Two messages that differ by 1 bit will have their codewords differ by 2 bits, and any messages that differ by more than 1 bit will necessarily have their codewords differ by at least 2 bits. This means we can detect all single-bit errors, but cannot correct errors. This is disappointing; all that work and we can't even correct any errors. ====================================================================== Rectangular Codes ====================================================================== Let's get a little bit fancier with our parity bits. D1 D2 | P1 <-- parity bit for row 1 D3 D4 | P2 <-- parity bit for row 2 ------ P3 P4 <-- parity bits for col 1 and 2, respectively This is a "rectangular code". We put the data bits into a rectangle, and added parity bits for each row and each column. ** What is k? 4 (4 data bits) What is n? 8 (4 data bits + 4 parity bits) What is d? We'll find out soon. Let's see if we can correct any errors: Example 1: 0 1 | 1 All parity bits check: no errors 1 1 | 0 ----- 1 0 Example 2: 0 1 | 1 Parity bit for row 2 is incorrect; something must 1 0 | 0 have gone wrong. And the parity bit for column 2 ----- is incorrect. So we know which bit was in error! 1 0 (D4) Example 3: 0 1 | 1 Parity bit for row 2 is incorrect, but all parity 1 1 | 1 bits for the columns are correct. So P2 is incorrect ----- 1 0 We have been able to correct some single-bit errors. What is d for this code? I claim d = 3. Pf: if two messages have hamming distance 1, then some data bit d_i is different. Their codewords will differ in three places: d_i, the parity bit for d_i's row, and the parity bit for d_i's column. So the hamming distance of the codewords is 3. E.g.: 0 0 | 0 0 1 | 1 0 0 | 0 0 0 | 0 ----- ----- 0 0 0 1 if two messages have hamming distance 2, then d_i and d_j are different. Their codewords will differ in d_i, d_j, and either two rows or two columns (or possibly both). So d >= 4 E.g.: 0 0 | 0 1 1 | 0 0 0 | 0 0 0 | 0 ----- ----- 0 0 1 1 if two messages have hamming distance 3 or more, we'll also find that d >= 4. Since d = 3, we can correct all single-bit errors. This makes this code an "SEC" code: single error correction code. Hooray! That's what we were looking for. What is the rate for this code? For this example, n=8, k=4. What about in general, with r rows and c columns of data? number of data bits = r * c number of parity bits = r + c => total number of bits = rc + r + c So rate is rc / (rc + r + c) ---------------------------------------------------------------------- Systematic form ---------------------------------------------------------------------- Before we think more about this code, let's put it in systematic form. Our code blocks would be [d1 d2 d3 d4][p1 p2 p3 p4] What is G for this code? The first parity bit uses, or "covers", d1 and d2, the second covers d3 and d4, the third covers d1 and d3, and the fourth covers d2 and d4. So: [d1 d2 d3 d4][1 0 0 0 1 0 1 0] [0 1 0 0 1 0 0 1] [0 0 1 0 0 1 1 0] [0 0 0 1 0 1 0 1] ---------------------------------------------------------------------- Overhead ---------------------------------------------------------------------- So we have gotten better: we went from being able to correct no errors, to being able to correct single-bit errors. Let's think about what price we paid for doing this. If we had just sent our data bits, we would've sent 4 bits per block for this code. Now, we are sending 8 bits. That's a lot of extra bits; our message is twice as long! Do we really need so many? Let's see if we can get some sort of bounds on n, to give us guidance. For SEC, our parity bits need to have at least n+1 distinct values: - The case where there are no errors - Error in the i^th bit of the n-bit codeword Aside: this calculation is going to be different if we want to correct more than single-bit errors; we'll need to deal with more than just n+1 combinations. We have n-k parity bits, so we can represent 2^{n-k} possibilities. This means that we need: n+1 <= 2^{n-k} ===> n <= 2^{n-k} - 1 For our rectangular code example, n=8, k=4: 8 <= 2^4 - 1 = 15. This is a pretty large gap. Is it possible to use fewer than 4 parity bits, for 4 data bits? For instance, what if we tried to only use 2 parity bits? n <= 2^{n-k}-1 ===> 6 <= 2^2 - 1 ===> 6 <= 3 That's not going to work. What about 3 parity bits? n <= 2^{n-k}-1 ===> 7 <= 2^3-1 ===> 7 <= 7 Well! This is evidence that we should be able to do SEC correction with fewer parity bits. ====================================================================== Example 3: Hamming Codes ====================================================================== Hamming codes correct single errors with the minimum number of parity bits: n = 2^{n-k} - 1 This will give us codes with rates like: (7, 4, 3), (15, 11, 3), .. (2^m - 1, 2^m -1 - m, 3) For us, right now, this is great because of efficiency. But that's not the only criterion for a code: the ability of a code's k/n to approach channel capacity, and various other factors, are also important. For this last bit, we will focus on setting up Hamming codes. Wednesday we will learn how to encode and decode them efficiently. ---------------------------------------------------------------------- Parity bits for (7,4,3) ---------------------------------------------------------------------- In general, hamming codes use the minimum number of parity bits, each covering a subset of the data bits. No two message bits belong to exactly the same subsets, so a single-bit error will generate a unique set of parity check errors. We'll start out with an example: the (7,4,3) hamming code. P1 = D1 + D2 + D4 P2 = D1 + D3 + D4 P3 = D2 + D3 + D4 Q. What if we check, and P1 and P3 indicate an error? - Something is wrong in P1's bits - Something is wrong in P3's bits - Nothing is wrong in P2's bits => D2 was flipped. Q. What if we check, and P2 indicates an error? - Nothing is wrong in P1's bits - Something is wrong in P2's bits - Nothing is wrong in P3's bits => P2 itself was flipped. If it had been one of the data bits, then one of the other parity bits also would've been incorrect. Recalling back to our rectangular codes, there we had: P1 = D1 + D2 P2 = D3 + D4 P3 = D1 + D3 P4 = D2 + D4 We were doing the same type of operations, but we didn't set the parity bits up as efficiently. ---------------------------------------------------------------------- Parity bits in general ---------------------------------------------------------------------- The way we set up the parity bits might seem arbitrary; if this were the case, Hamming codes would be difficult to generalize past the (7,4,3) case. Luckily, there is structure to follow: Bit position: 1 2 3 4 5 6 7 Bit position in binary: 001 010 011 100 101 110 111 All bit positions that are powers of two are going to correspond to parity bits. Bit position: 1 2 3 4 5 6 7 Bit position in binary: 001 010 011 100 101 110 111 Encoded data bits: p1 p2 p4 With whatever is left, we fill in data bits Bit position: 1 2 3 4 5 6 7 Bit position in binary: 001 010 011 100 101 110 111 Encoded data bits: p1 p2 d1 p4 d2 d3 d4 p1 covers all bits that have the least significant bit set Bit position: 1 2 3 4 5 6 7 Bit position in binary: 001 010 011 100 101 110 111 Encoded data bits: p1 p2 d1 p3 d2 d3 d4 ---------------------------------------------------- covered by p1: X X X X p2 covers all bits that have the second-least significant bit set Bit position: 1 2 3 4 5 6 7 Bit position in binary: 001 010 011 100 101 110 111 Encoded data bits: p1 p2 d1 p3 d2 d3 d4 ---------------------------------------------------- covered by p1: X X X X covered by p2: X X X X p3 covers all bits that have the third-least significant bit set Bit position: 1 2 3 4 5 6 7 Bit position in binary: 001 010 011 100 101 110 111 Encoded data bits: p1 p2 d1 p3 d2 d3 d4 ---------------------------------------------------- covered by p1: X X X X covered by p2: X X X X covered by p3: X X X X So you see: P1 = D1 + D2 + D4 P2 = D1 + D3 + D4 P3 = D2 + D3 + D4 This chart gets a little more fun if we take it out to longer Hamming codes: Bit position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Encoded bits: p1 p2 d1 p4 d2 d3 d4 p8 d5 d6 d7 d8 d9 d10 d11 ------------------------------------------------------------------- covered by p1: X X X X X X X X covered by p2: X X X X X X X X covered by p3: X X X X X X X X covered by p4: X X X X X X X X What you should be asking yourself now is: - How can we do this encoding with matrix operations, i.e., what is "G" for this code? - How do we do the decoding efficiently? ====================================================================== Summary ====================================================================== Today: - We aim to choose codes with a particular minimum hamming distance d, because d tells us how many errors we can detect and correct. - We use linear codes for their power and efficiency (note: you haven't quite seen the efficiency of linear codes yet) - Simple parity check has d=1, which gives us no SEC. Rectangular codes have d=3, which lets us do SEC, but is wasteful - Hamming codes are the most efficient (in terms of overhead) for doing SEC