6.02 - Convolutional Coding Lecture #6 Katrina LaCurts, lacurts@mit.edu ====================================================================== Motivation: Space ====================================================================== 1. 1962-1967 Early Mariner Probes (Mars, Venus): No ECC. Very slow communication (~8bits/second) 2. 1969-1976 Later Mariner and Viking Probes (Mars, Venus): Linear Block Codes (Hadamard codes; see below) 3. 1971 Mariner 9 (Mars orbit): Also used a bi-orthogonal code Mariner 9 needed a code to correct picture transmission errors. Error correction was necessary because a low "signal-to-noise ratio" (SNR), which means that it was difficult to discern the signal from the noise (you will get lots more practice with the notion of an SNR in coming weeks). The design and use of the craft put some constraints on the type of codes that could be used. Here's the type of system we're working with: - The central computer and sequencer had an onboard memory of 512 words (words!) - The command system was programmed with fewer than 100 commands - Data was stored on reel-to-reel tape recorder, which could store 22.5 MB recorded at 132Kbit/s. Playback could be done at up to 16Kb/s. And to communicate: - Telecommunications were via dual S-band -- the S-band is part of the microwave band of the electromagnetic spectrum -- 10 W/20 W transmitters and a single receiver through the high gain parabolic antenna, the medium gain horn antenna, or the low gain omni-directional antenna. (http://nssdc.gsfc.nasa.gov/nmc/spacecraftDisplay.do?id=1971-051A) Coding wise: Data blocks were 6 bits long (i.e., k=6), to hold 64 grayscale values. The available block length was around 30, due to the design of the transmitter. With n=30, they could've done a 5x repetition code, but a comparable rate with better error correction comes from the (32, 6, 16) Hadamard code: This code consists of the all zero codeword, the all 1 codeword, and then all other codewords have 16 zeros and 16 ones; the complement of each codeword is a codeword. In general, these types of codes give us (2^{k-1}, k, 2^{k-2}) codes. Another factor in using this code -- which was actually used through the 1980s -- was the efficient decoding algorithm (it's essentially syndrome decoding, but with a few tricks added. See: http://www.mcs.csueastbay.edu/~malek/TeX/Hadamard.pdf) 4. (1977) Voyager 1 Voyager 1 has currently *left the solar system*. So let's think more about that transmitter. The 20W transmitter on the Voyager 1 has to cover tens of billions of kilometers. We can do some fancy things to the hardware itself: - Direct the antenna beam to produce the same received intensity as an omni-directional antenna radiating ~megawatts. - "cryogenically-cooled low-noise amplifiers" But additionally: "..data coding and error-correction schemes. These systems can collect, detect, lock onto, and amplify a vanishingly small signal that reaches Earth from the spacecraft, and can extract data from the signal virtually without errors.” The types of codes that Voyager (and subsequent space probes such as Cassini, Mars Exploration Rover, ..) uses are called "convolutional codes", and the decoding mechanism they use is called "Viterbi decoding". This is what we'll be covering today and Wednesday. 5. (1997) Cassini (Saturn) Cassini uses a convolutional code with r=1/2 or r=1/6. Using these codes alone on the channel between Cassini and the Earth typically results in a BER of 1 in 200. ..Which actually isn't so great; this high BER is indicative of fairly high noise on the channel. However, the Cassini's convolutional code is used in conjunction with a particular type of linear block code called a Reed-Solomon code. This code is (255, 223, 33). (Aside: now you can see why in some cases, the naive method of decoding is going to be really expensive.) Cassini first does a Reed-Solomon encoding of the data (also using interleaving, which you saw on Wednesday), and then convolutionally encodes the Reed-Solomon symbols. We call this a concatenated code. The BER of this concatenated code is 1 per million or better. Even though we won't talk about concatenated codes explicitly, know this: much of coding is focused on designing codes that approach the channel capacity (i.e., get as much information across the channel as they can), but are also efficient to encode/decode. Neither linear block codes nor convolutional codes alone satisfy both of these constraints. By concatenating these two types of codes in smart ways, we can start to achieve these properties. (We're not going to prove any of that in 6.02. If it whets your appetite for coding, there are further courses you can take!) Convolutional codes/Viterbi decoding are used in many terrestrial applications as well. For instance, most of your cellphones. Also, digital video broadcasting; Qualcomm decodes about 10^15 bits every second of every day with Viterbi decoders in digital TVs. ====================================================================== Convolutional Codes ====================================================================== The most obvious differences between convolutional codes and linear block codes are: 1. Convolutional codes act on message bits streaming into the encoder. 2. We transmit *only* parity bits, not data bits and parity bits. This means that convolutional codes are non-systematic codes (non-systematic = only parity bits). The way we will be able to get error correction properties out of convolutional codes is by having each message bit affect many different parity bits (in some sense: a particular message bit will be involved in many different parity equations). The code rate of these types of codes tells us how many parity bits are sent for each message bit. We will be talking mostly about 1/r codes (i.e., r parity bits per message bit). ---------------------------------------------------------------------- Easy Example ---------------------------------------------------------------------- Like linear block codes, we describe convolutional codes by giving the equations for the parity bits. Here is a simple convolutional code: p_0[n] = x[n] + x[n-1] + x[n-2] p_1[n] = x[n] + x[n-2] Imagine the following message: 11001101 Here, x[0] = 1, x[1] = 1, x[2] = 0, etc. This is how we are thinking of the message as a stream. We're going to assume that there is an initial part of the stream that is all zeros (I will make this more formal shortly), and then look at what happens when we get our first nonzero data bit. So we'll actually imagine the message as 0011001101 Now, at time n=0, which bits will we use? x[0]x[-1]x[-2] = 001. So: 0011001101 p_0[0] = 1 + 0 + 0 = 1 ^^^ p_1[0] = 1 + 0 = 1 At n=1, we are using x[1]x[0]x[-1]: 0011001101 p_0[1] = 1 + 1 + 0 = 0 ^^^ p_1[1] = 1 + 0 = 1 At n=2, we are using x[2]x[1]x[0] 0011001101 p_0[2] = 0 + 1 + 1 = 0 ^^^ p_1[2] = 0 + 1 = 1 Transmitted sequence: 110101.. (the parity bits) Etc. You can see that we moved through the message bit with a "sliding window" of data. For instance, for this example, when n=2, we used x[0] through x[2]. When n=3, we used x[1] through x[3]. The width of this window (in bits) is called the code's constraint length K. For this code, K=3. For transmitting, we send the parity sequences, not the message itself. When we have multiple parity equations, we interleave the bits of the parity sequences. E.g., for our example, we send: p_0[0] p_1[0] p_0[1] p_1[1] p_0[2] p_1[2] ... Each message bit is "spread across" K elements of each parity sequence, so the parity sequences are better protection against bit errors than the message sequence itself. Finally, to finish transmitting the message, we assume that it ends with all zeroes. ---------------------------------------------------------------------- Decoding ---------------------------------------------------------------------- Let's think about how we would decode a message, assuming that there was no noise introduced. p_0[n] = x[n] + x[n-1] + x[n-2] p_1[n] = x[n] + x[n-2] Adding these equations, we see that p_0[n] + p_1[n] = x[n-1]. So we can simply sum the parity bits and decode the message (we'd be one bit behind in each step). In the presence of errors in the parity stream, however, the calculated message bits will end up corrupted at about the same rate as parity bits were. So this is not going to work very well. We will come back to this notion at the end of class, and discuss it at length on Wednesday. ---------------------------------------------------------------------- Shift-register View ---------------------------------------------------------------------- There are a lot of different ways to visualize/describe convolutional codes. The first is the "shift-register" view, which actually reflects how this coding process might be implemented in hardware. For the example I did above, here is the shift-register view: ----------------> + --------> p_1[n] | ^ | | | ----------------- x[n] -------> | x[n-1] | x[n-2] | | ----------------- | | | | | v | -------> + --------> p_0[n] | ^ | | -------------------- x[n-1] and x[n-2] are held in two registers. A new bit, x[n], arrives at the left. The box computes the parity bits from the incoming bit x[n] and the K-1 previous message bits. At the end of the iteration, the saved message bits are shifted right by one, and the incoming bit moves into the left position. So, in our message 011001101, we would have: 1 --> 10 p0 = 0; p1 = 1 0 --> 11 p0 = 0; p1 = 1 0 --> 01 p0 = 1; p1 = 1 1 --> 00 p0 = 1; p1 = 1 Etc. At the start of encoding, we assume x[n-1] and x[n-2] contain zero. When we finish, we add k-1 zeroes to the end of the code, to allow the registers to "settle", again at zero. ---------------------------------------------------------------------- Generators ---------------------------------------------------------------------- Not surprisingly, we have a notion of a generator for convolutional codes, just as we had a generator matrix for linear block codes. We typically speak of a set of generators, where g_i is the generator for parity bit p_i. In our example: p_0 = x[n] + x[n-1] + x[n-2] ==> g_0 = 111 p_1 = x[n] + x[n-2] ==> g_1 = 101 If we had a third: p_2 = x[n] + x[n-1] ==> g_2 = 110 So each element in g_i is 0 or 1. There is a 1 in the j^th position if x[n-j] is used (we denote the first position as index 0). This relates directly to our shift-register view, as you can see on the diagram. ---------------------------------------------------------------------- Convolution ---------------------------------------------------------------------- With these generators, we can express the parity bits in a different way. A convolutional code generates sequences of parity bits from sequences of message bits by a convolution operation: p_i[n] = sum_{j=0}^{K-1} g_i[j] x[n-j] mod 2 K here is, again, the constraint length. g_i is the K-element generator for parity bit p_i. p_0[n] = sum_{j=0}^{K-1} g_0[j] x[n-j] mod 2 = sum_{j=0}^{2} g_0[j] x[n-j] mod 2 = g_0[0][0]*x[n] + g_0[1]*x[n-1] + g_0[2]*x[n-2] = 1*x[n] + 1*x[n-1} + 1*x[n-2] Note that the input bitstream, as a function of j, is reversed in time. An aside: The convolution operation is going to become very important to us later on in this class, so you should start to get familiar with it now. Regarding the constraint length K, you can see that the larger K is, the more times a particular message bit is used when calculating parity bits. This will lead to greater redundancy, and usually to better error correcting possibilities. ====================================================================== Additional Views ====================================================================== In addition to the shift-register view, there are two additional ways to view convolutional codes. ---------------------------------------------------------------------- State-machine View ---------------------------------------------------------------------- First we'll start with the State-machine view. Recall the shift-register view, where we said the values in the registers tell us the "states" the encoder can be in, i.e., there is a state for each possible combination of x[n-1] x[n-2]. That means for K=3, there are 2^{K-1} states: 00, 01, 10, 11 Depending on the next bit, we will be able to transition from one state to another. So for example, if we start with 00, and the next bit is a 0, we will return to 00. If the next bit is a 1, we will transition to 10. There is no transition from 00 to 01, or to 11; that would be impossible. (Note to students: Drawing the state-machine view was beyond my ASCII-diagram skills, so please consult the lecture slides for an illustration of this process) Now, notice that we have said nothing about what parity equations we're using yet. We haven't needed them! Every state machine diagram for K=3 will have these states and these transitions. The only thing that will change is what I'm going to add next. On each transition, we're going to add to the label. Instead of just x[n], we will also indicate the parity bits that are outputted. Ex: On the transition from 00 to 10, we have x[n] = 1, x[n-1] = x[n-2] = 0. With our example, p_0[n] = 1 + 0 + 0 = 1 and p_1[n] = 1 + 0 = 1. So we label this edge as "1/11" (x[n]/p_0[n]p_1[n]) ---------------------------------------------------------------------- Trellis View ---------------------------------------------------------------------- The state machine alone is pretty useful. But it only gives a visual representation of where we are at iteration n. It would be nice to have a representation that kept track of what states we had been in previously. (In fact, when we learn how to decode convolutional codes, it's going to be *very* nice) What we will do is, essentially, take the state machine and spread it out over time. This is called the "trellis" view. With this view, we can trace a path over time. In the trellis view, we will only label the transitions with the parity bits, instead of always including x[n]. We do this because of the two arrows exiting a transition, the upper arrow always corresponds to x[n]=0, and the lower arrow always corresponds to x[n]=1. (Students, again, please see the slides for this illustration) Our encoding procedure is now this: Beginning at the starting state, for each message bit, make the appropriate state transition, and send p_0p_1. Then end the message with K-1 zeros, to ensure return to the starting state. Any path through the trellis corresponds to a code word. We could also consider the subset of paths that start in the zero state and end at time step N in the zero state – these would generate codewords of length N (though a convolutional code is not limited to this, or to a particular choice of N). The message bits can be read off by noting whether the path at each stage takes the upper branch (bit 0) or lower branch (bit 1). And the codeword bits can be read off from the labels on the traversed path. Aside: Because the states are binary strings, you will sometimes see them interpreted as decimal numbers, i.e., "0, 1, 2, 3" instead of "00, 01, 10, 11". The former can be particularly useful in implementations. ---------------------------------------------------------------------- Free Distance ---------------------------------------------------------------------- One last thing before we start to talk about the decoding process. The parity stream forms a linear code (the parity bits are linear functions of the data bits). Aside: The fact that parity bits are linear functions of the data bits is essentially saying that c=dG, which establishes the linearity of the code. If you want to instead invoke our other characterization of linearity, note that the sum of two *data* words is a data word, because all data words are allowed. In view of the convolution expression for the parity bits, it follows that the sum of two parity streams is a parity stream (the stream corresponding to the sum of the two data words). Thanks to George Verghese for the above explanation, which is much more succinct than what I had originally. The smallest-weight nonzero codeword has a weight that, locally in time, plays a role analogous to d (the minimum Hamming distance). It's called the free distance of the convolutional code; we'll denote it d_{free}. With linear block codes, d told us that we could correct all patterns of up to floor( (d-1)/2 ) errors in a block. In convolutional codes, d_free tells us something similar: we can correct floor ( (d_free - 1)/2 ) errors as long as they are "far enough apart". Why this notion of "far enough apart"? If we took the interpretation of the minimum Hamming distance to convolutional codes, we might conclude that we could correct only floor( (d-1)/2 ) errors in the entire stream. That is false; the stream is infinite, and we can certainly correct more than floor( (d-1)/2 ) errors in total. This is where the notion of "far enough apart" comes in; we can't correct more than floor( (d_free-1)/2 ) errors in a burst. ====================================================================== Decoding in the presence of errors ====================================================================== Earlier we discussed a naive way of decoding (in our example, we can just sum the parity bits to recover the stream), and alluded to the fact that it won't work well in the presence of errors. What is really going on at the receiver? The receiver doesn't have direct knowledge of the transmitter's state transitions; it only knows the (possibly corrupted) parity bits p_i. The goal is to find the most likely sequence of transmitter states that could have generated the received parity bits. Since we're assuming a BSC with p < .5, it's more probable that we have fewer errors than more errors (E.g., if we're back in our repetition code, and I receive 00010, it's a lot more likely that the transmitter sent 00000 instead of 11111). Because of this, the maximum-likelihood message sequence is the one that generated the codeword (here, the sequence of parity bits) with the smallest Hamming distance from the received codeword (again, here, the sequence of parity bits). So the goal: find the nearest valid codeword closest to the received codeword. This is maximum-likelihood decoding, and we will learn how to do it for convolutional codes on Wednesday.