### 6.02 Practice Problems: Information, Entropy, & Source Coding

Problem . Huffman coding is used to compactly encode the species of fish tagged by a game warden. If 50% of the fish are bass and the rest are evenly divided among 15 other species, how many bits would be used to encode the species when a bass is tagged?

If a symbol has a probability of ≥ .5, it will be incorporated into the Huffman tree on the final step of the algorithm, and will become a child of the final root of the decoding tree. This means it will have a 1-bit encoding.

Problem . Several people at a party are trying to guess a 3-bit binary number. Alice is told that the number is odd; Bob is told that it is not a multiple of 3 (i.e., not 0, 3, or 6); Charlie is told that the number contains exactly two 1's; and Deb is given all three of these clues. How much information (in bits) did each player get about the number?

N = 8 choices for a 3-bit number
Alice: odd = {001, 011, 101, 111} M=4, info = log2(8/4) = 1 bit
Bob: not a multiple of 3 = {001, 010, 100, 101, 111} M=5, info = log2(8/5) = .6781 bits
Charlie: two 1's = {011, 101, 110} M=3, info = log2(8/3) = 1.4150
Deb: odd, not a multiple of 3, two 1's = {101}, M=1, info = log(8/1) = 3 bits

Problem . X is an unknown 8-bit binary number. You are given another 8-bit binary number, Y, and told that Y differs from X in exactly one bit position. How many bits of information about X have you been given?

Knowing Y, we've narrowed down the value of X to one of 8 choices. Bits of information = log2(256/8) = 5 bits.

Problem . In Blackjack the dealer starts by dealing 2 cards each to himself and his opponent: one face down, one face up. After you look at your face-down card, you know a total of three cards. Assuming this was the first hand played from a new deck, how many bits of information do you now have about the dealer's face down card?

We've narrowed down the choices for the dealer's face-down card from 52 (any card in the deck) to one of 49 cards (since we know it can't be one of three visible cards. bits of information = log2(52/49).

Problem . The following table shows the current undergraduate and MEng enrollments for the School of Engineering (SOE).

Course (Department)# of studentsProb.
I (Civil & Env.)121.07
II (Mech. Eng.)389.23
III (Mat. Sci.)127.07
VI (EECS)645.38
X (Chem. Eng.)237.13
XVI (Aero & Astro)198.12
Total17171.0

1. When you learn a randomly chosen SOE student's department you get some number of bits of information. For which student department do you get the least amount of information?
We'll get the smallest value for log2(1/p) by choosing the choice with the largest p => EECS.

2. Design a variable length Huffman code that minimizes the average number of bits in messages encoding the departments of randomly chosen groups of students. Show your Huffman tree and give the code for each course.
step 1 = {(I,.07) (II,.23) (III,.07) (VI,.38) (X,.13) (XVI,.12)}
step 2 = {([I,III],.14) (II,.23) (VI,.38) (X,.13) (XVI,.12)}
step 3 = {([I,III],.14) (II,.23) (VI,.38) ([X,XVI],.25)}
step 4 = {([II,[I,III]],.37) (VI,.38) ([X,XVI],.25)}
step 5 = {([[X,XVI],[II,[I,III]]]),.62) (VI,.38)}
step 6 = {([[[X,XVI,[II,[I,III]],VI],1.0)}

code for course I: 0 1 1 0
code for course II: 0 1 0
code for course III: 0 1 1 1
code for course VI: 1
code for course X: 0 0 0
code for course XVI: 0 0 1

There are of course many equivalent codes derived by swapping the "0" and "1" labels for the children of any interior node of the decoding tree. So any code that meets the following constraints would be fine:

code for VI = length 1
code for II, X, XVI = length 3
code for I, III = length 4

3. If your code is used to send messages containing only the encodings of the departments for each student in groups of 100 randomly chosen students, what's the average length of such messages? Write an expression if you wish.
100*[(.38)(1) + (.23 + .13 + .12)*(3) + (.07 + .07)*(4)] = 238 bits

Problem . You're playing an on-line card game that uses a deck of 100 cards containing 3 Aces, 7 Kings, 25 Queens, 31 Jacks and 34 Tens. In each round of the game the cards are shuffled, you make a bet about what type of card will be drawn, then a single card is drawn and the winners are paid off. The drawn card is reinserted into the deck before the next round begins.

1. How much information do you receive when told that a Queen has been drawn during the current round?
information received = log2(1/p(Queen)) = log2(1/.25) = 2 bits

2. Give a numeric expression for the average information content received when learning about the outcome of a round (aka the entropy).
(.03)log2(1/.03) + (.07)log2(1/.07) + (.25)log2(1/.25) + (.31)log2(1/.31) + (.34)log2(1/.34) = 1.97 bits

3. Construct a variable-length Huffman encoding that minimizes the length of messages that report the outcome of a sequence of rounds. The outcome of a single round is encoded as A (ace), K (king), Q (queen), J (jack) or X (ten). Specify your encoding for each of A, K, Q, J and X.

Encoding for A: 001
Encoding for K: 000
Encoding for Q: 01
Encoding for J: 11
Encoding for X: 10

4. Using your code from part (C) what is the expected length of a message reporting the outcome of 1000 rounds (i.e., a message that contains 1000 symbols)?
(.07+.03)(3 bits) + (.25+.31+.34)(2 bits) = 2.1 bits average symbol length using Huffman code. So expected length of 1000-symbol message is 2100 bits.

5. The Nevada Gaming Commission regularly receives messages in which the outcome for each round is encoded using the symbols A, K, Q, J and X. They discover that a large number of messages describing the outcome of 1000 rounds (i.e. messages with 1000 symbols) can be compressed by the LZW algorithm into files each containing 43 bytes in total. They immediately issue an indictment for running a crooked game. Briefly explain their reasoning.
43 bytes is 344 bits, way below the entropy of 1970 bits for encoding 1000 symbols. This indicates the actual symbol probabilities must be dramatically different than the stated probabilities that were used to calculate the entropy in part (B). The experimentally determined value of the actual entropy (344/1000 = .344 bits) indicates that one or more of the symbols must have a much higher probability of occuring than stated, which suggests a rigged game.

Problem .

Consider messages made up entirely of vowels (A, E, I, O, U). Here's a table of probabilities for each of the vowels:

lp(l)log2(1/p(l))p(l)*log2(1/p(l))
A0.222.180.48
E0.341.550.53
I0.172.570.43
O0.192.400.46
U0.083.640.29
Totals1.0012.342.19

1. Give an expression for the number of bits of information you receive when learning that a particular vowel is either I or U.
The probability that a letter is either I or U is p(I) + p(U) = 0.25. So the number of bits of information you receive is log2(1/.25) = 2 bits.

2. Using Huffman's algorithm, construct a variable-length code assuming that each vowel is encoded individually. Please draw a diagram of the Huffman tree and give the encoding for each of the vowels.
Running Huffman's algorithm:
```      /\
/  \
/    \
/\    /\
A  O  E  \
/\
I  U
```
So assuming we label left branches with "0" and right branches with "1", the encodings are
A = 00
O = 01
E = 10
I = 110
U = 111

3. Using your code from above, give an expression for the expected length in bits of an encoded message transmitting 100 vowels.
Expected length for one vowel = Σ p(l)*len(encoding(l)):

0.22*2 + 0.19*2 + 0.34*2 + 0.17*3 + 0.08*3 = 2.25 bits

So the expected length for 100 vowels is 225 bits.

4. Ben Bitdiddle spends all night working on a more complicated encoding algorithm and sends you email claiming that using his code the expected length in bits of an encoded message transmitting 100 vowels is 197 bits. Would you pay good money for his implementation?
No! The entropy is 2.19 bits which means that 219 bits is a lower bound on the number of bits needed on the average to encode 100 vowels. Ben's encoding uses less bits, which is impossible, so his code or his derviation of the expected lenth must be bogus in some way.

Problem . Describe the contents of the string table created when encoding a very long string of all a's using the simple version of the LZW encoder shown below. In this example, if the decoder has received E encoded symbols (i.e., string table indices) from the encoder, how many a's has it been able to decode?

```initialize TABLE[0 to 255] = code for individual bytes
STRING = get input symbol
while there are still input symbols:
SYMBOL = get input symbol
if STRING + SYMBOL is in TABLE:
STRING = STRING + SYMBOL
else:
output the code for STRING
add STRING + SYMBOL to TABLE
STRING = SYMBOL
output the code for STRING
```
The string table is filled with sequences of a's, each one longer than the last. So

table[256] = aa
table[257] = aaa
table[258] = aaaa
...

Here's what the decoder receives, and the decoding it does (it's instructive to figure out how the decoder is building its copy of the string table based on what it's receiving):

index of a => decode a
256 => decode aa
257 => decode aaa
258 => decode aaaa
...

So if E symbols have arrived, the decoder has produced 1 + 2 + 3 + ... + E a's, which sums to E(E+1)/2.

Problem . Consider the pseudo-code for the LZW decoder given below:

```initialize TABLE[0 to 255] = code for individual bytes
CODE = read next code from encoder
STRING = TABLE[CODE]
output STRING

while there are still codes to receive:
CODE = read next code from encoder
if TABLE[CODE] is not defined:
ENTRY = STRING + STRING[0]
else:
ENTRY = TABLE[CODE]
output ENTRY
STRING = ENTRY
```
Suppose that this decoder has received the following five codes from the LZW encoder (these are the first five codes from a longer compression run):
```97 -- index of 'a' in the translation table
98 -- index of 'b' in the translation table
257 -- index of second addition to the translation table
256 -- index of first addition to the translation table
258 -- index of third addition to the translation table
```
After it has finished processing the fifth code, what are the entries in TABLE and what is the cumulative output of the decoder?
Here's the step-by-step operation of the decoder:

CODE=97, STRING='a', output 'a'
CODE=98, ENTRY='b', output 'b', TABLE[256]='ab', STRING='b'
CODE=257, ENTRY='bb', output 'bb', TABLE[257]='bb', STRING='bb'
CODE=256, ENTRY='ab', output 'ab', TABLE[258]='bba', STRING='ab'
CODE=258, ENTRY='bba', output 'bba', TABLE[259]='abb', STRING='bba'

So the cummulative output of the decoder is 'abbbabbba'.

Problem . Huffman and other coding schemes tend to devote more bits to the coding of

(A) symbols carrying the most information
(B) symbols carrying the least information
(C) symbols that are likely to be repeated consecutively
(D) symbols containing redundant information
Answer is (A). Symbols carrying the most information, i.e., the symbols that are less likely to occur. This makes sense: to keep messages as short as possible, frequently occurring symbols should be encoded with fewer bits and infrequent symbols with more bits.

Problem . Consider the following two Huffman decoding tress for a variable-length code involving 5 symbols: A, B, C, D and E.

1. Using Tree #1, decode the following encoded message: "01000111101".
Starting at the root of the decoding tree, use the bits of the message to traverse the tree until a leaf node is reached; output that symbol. Repeat until all the bits in the message are consumed.

0 = A
100 = B
0 = A
111 = E
101 = C

2. Suppose we were encoding messages with the following probabilities for each of the 5 symbols: p(A) = 0.5, p(B) = p(C) = p(D) = p(E) = 0.125. Which of the two encodings above (Tree #1 or Tree #2) would yield the shortest encoded messages averaged over many messages?
Using Tree #1, the expected length of the encoding for one symbol is:

1*p(A) + 3*p(B) + 3*p(C) + 3*p(D) + 3*p(E) = 2.0

Using Tree #2, the expected length of the encoding for one symbol is:

2*p(A) + 2*p(B) + 2*p(C) + 3*p(D) + 3*p(E) = 2.25

So using the encoding represented by Tree #1 would yield shorter messages on the average.

3. Using the probabilities of part (B), if you learn that the first symbol in a message is "B", how many bits of information have you received?
bits of info received = log2(1/p) = log2(1/.125) = 3

4. Using the probabilities of part (B), If Tree #2 is used to encode messages what is the average length of 100-symbol messages, averaged over many messages?
In part (B) we calculated that the expected length of the encoding for one symbol using Tree #2 was 2.25 bits, so for 100-symbol messages the expected length is 225 bits.

Problem . Ben Bitdiddle has been hired by the Registrar to help redesign the grades database for WebSIS. Ben is told that the database only needs to store one of five possible grades (A, B, C, D, F). A survey of the current WebSIS repository reveals the following probabilities for each grade:

Ap(A) = 18%
Bp(B) = 27%
Cp(C) = 25%
Dp(D) = 15%
Fp(E) = 15%

1. Given the probabilities above, if you are told that a particular grade is "C", give an expression for the number of bits of information you have received.
# of bits for "C" = log(1/pC) = log(1/.25) = log(4) = 2

2. Ben is interested in storing a sequence of grades using as few bits as possible. Help him out by creating a variable-length encoding that minimizes the average number of bits stored for a sequence of grades. Use the table above to determine how often a particular grade appears in the sequence.
using Huffman algorithm:
- since D,F are least probable, make a subtree of them, p(D/\F) = 30%
- now A,C are least probable, make a subtree of them, P(A/\C) = 43%
- now B,DF are least probable, make a subtree of them P(B/\(D/\F)) = 55%
- just AC,BDF are left, make a subtree of them (A/\C)/\(B/\(D/\F))
- so A = 00, B = 10, C = 01, D = 110, F = 111

Problem . Consider a sigma-delta modulator used to convert a particular analog waveform into a sequence of 2-bit values. Building a histogram from the 2-bit values we get the following information:

Modulator value# of occurrences
0025875
01167836
10167540
1125974

1. Using probabilities computed from the histogram, construct a variable-length Huffman code for encoding these four values.
p(00)=.0668 p(01)=.4334 p(10)=.4327 p(11)=.0671
Huffman tree = 01 /\ (10 /\ (00 /\ 11))
Code: 01="0" 10="10" 00="110" 11="111"

2. If we transmit the 2-bit values as is, it takes 2 bits to send each value (doh!). If we use the Huffman code from part (A) what is the average number of bits used to send a value? What compression ratio do we achieve by using the Huffman code?
sum(pi*len(code for pi)) = 1.7005, which is 85% of the original 2-bit/symbol encoding.

3. Using Shannon's entropy formula, what is the average information content associated with each of the 2-bit values output by the modulator? How does this compare to the answers for part (B)?
sum(pi*log2(1/pi)) = 1.568, so the code of part(B) isn't quite as efficient as we can achieve by using an encoding that codes multiple pairs as in one code symbol.

Problem . In honor of Daisuke Matsuzaka's first game pitching for the Redsox, the Boston-based members of the Search for Extraterrestrial Intelligence (SETI) have decided to broadcast a 1,000,000 character message made up of the letters "R", "E", "D", "S", "O", "X". The characters are chosen at random according the probabilities given in the table below:

Letterp(Letter)
R.21
E.31
D.11
S.16
O.19
X.02

1. If you learn that one of the randomly chosen letters is a vowel (i.e., "E" or "O") how many bits of information have you received?
p(E or O) = .31 + .19 = .5, log2(1/pi) = 1 bit

2. Nervous about the electric bill for the transmitter at Arecibo, the organizers have asked you to design a variable length code that will minimize the number of bits needed to send the message of 1,000,000 randomly chosen characters. Please draw a Huffman decoding tree for your code, clearly labeling each leaf with the appropriate letter and each arc with either a "0" or a "1".
Huffman algorithm:
choose X,D: X/\D, p = .13
choose S,XD: S/\(X/\D), p = .29
choose R,0: R/\O, p = .40
choose E,SXD: E/\(S/\(X/\D)), p = .6
choose RO,ESXD: (R/\O) /\ (E/\(S/\(X/\D)))
code: R=00 O=01 E=10 S=110 X=1110 D=1111

3. Using your code, what bit string would they send for "REDSOX"?
00 10 1111 110 01 1110

1. You're given a standard deck of 52 playing cards that you start to turn face up, card by card. So far as you know, they're in completely random order. How many new bits of information do you get when the first car is flipped over? The fifth card? The last card?
First card: log2(52/1)

Fifth card: log2(48/1)

Last card: log2(1/1) = 0 (no information since, knowing the other 51 cards, one can predict with certainty the last card)

2. Suppose there three alternatives ("A", "B" and "C") with the following probabilities of being chosen:

p("A") = 0.8
p("B") = 0.1
p("C") = 0.1

We might encode the of "A" with the bit string "0", the choice of "B" with the bit string "10" and the choice of "C" with the bit string "11".

If we record the results of making a sequence of choices by concatenating in left-to-right order the bit strings that encode each choice, what sequence of choices is represented by the bit string "00101001100000"?

AABBACAAAAA

3. Using the encoding of the previous part, what is the expected length of the bit string that encodes the results of making 1000 choices? What is the length in the worst case? How do these numbers compare with 1000*log2(3/1), which is the information content of 1000 equally-probable choices?
Expected length = 1000[(0.8)(1) + (0.1)(2) + (0.1)(2)] = 1200 bits

In the worst case, each choice would take 2 bits to encode = 2000 bits.

1000*log2(3/1) = 1585. In general, a variable-length encoding of sequences of N choices with differing probabilities gets better compression than can be achieved if the choices are equally probable.

4. Consider the sum of two six-sided dice. Even when the dice are "fair" the amount information conveyed by a single sum depends on what the sum is since some sums are more likely than others, as shown in the following figure:

What is the average number of bits of information provided by the sum of 2 dice? Suppose we want to transmit the sums resulting from rolling the dice 1000 times. How many bits should we expect that transmission to take?

Average number of bits = sum (p_i)log2(1/p_i) for i = 2 through 12. Using the probabilities given in the figure above the average number of bits of information provided by the sum of two dice is 3.2744.

So if we had the perfect encoding, the expected length of the transmission would be 3274.4 bits. If we encode each sum separately we can't quite achieve this lower bound -- see the next question for details.

5. Suppose we want to transmit the sums resulting from rolling the dice 1000 times. If we use 4 bits to encode each sum, we'll need 4000 bits to transmit the result of 1000 rolls. If we use a variable-length binary code which uses shorter sequences to encode more likely sums then the expected number of bits need to encode 1000 sums should be less than 4000. Construct a variable-length encoding for the sum of two dice whose expected number of bits per sum is less than 3.5. (Hint: It's possible to find an encoding for the sum of two dice with an expected number of bits = 3.306.)
Using Huffman's algorithm, we arrive at the following encoding which has 3.3056 as the expected number of bits for each sum.

6. Can we make an encoding for transmitting 1000 sums that has an expected length smaller than that achieved by the previous part?
Yes, but we have to look at encoding more than one sum at a time, e.g., by applying the construction algorithm to pairs of sums, or ultimately to all 1000 sums at once. Many of the more sophisticated compression algorithms consider sequences of symbols when constructing the appropriate encoding scheme.

Problem .

Consider messages comprised of four symbols -- A, B, C, D -- each with an associated probability of occurrence: p(A), p(B), p(C), p(D). Suppose p(A) ≥ p(B) ≥ p(C) ≥ p(D). Write down a single condition (equation or inequality) that is both necessary and sufficient to guarantee that the Huffman algorithm will generate a two-bit encoding for each symbol, i.e., the corresponding Huffman code will actually be a fixed-length encoding using 2 bits for each symbol.

p(C) + p(D) > p(A). The reason is that there are only two possible Huffman trees with 4 leaves. For the tree where every symbol is encoded with two bits, we need to have p(C) + p(D) + p(B) > p(B) + p(A) to ensure that the algorithm will combine B with A instead of with C/D. We need a strict inequality because otherwise there is a possiblity of a tie and the unbalanced tree may be chosen since it has the same expected number of bits.

Problem . Consider a Huffman code over four symbols: A, B, C, and D. For each of the following encodings indicate if it is a valid Huffman code, i.e., a code that would have been produced by the Huffman algorithm given some set of probabilities p(A), ..., p(D).

1. A:0, B:11, C:101, D:100
Valid. This is one of the two possible Huffman trees with 4 leaves.

2. A: 1, B:01, C:00, D:010
Not valid. This code can't be unambigously decoded by the receiver since the encoding for B is a prefix for the encoding of D. For example, the message 0101 might be decoded as "BB" or "DA" -- the receiver doesn't have sufficient information to tell which.

3. A:00, B:01, C:110, D:111
Not valid. This a legal variable length code, but it's not optimal. A shorter code would have C and D encoded in 2 bits, as 10 and 11 (or vice versa), and that would be a Huffman code for the same symbol probabilities, not the one given.

Problem .

After careful data collection, Alyssa P. Hacker observes that the probability of HIGH or LOW traffic on Storrow Drive is given by the following table:

Conditionp(HIGH traffic)p(LOW traffic)
If the Redsox are playing0.999.001
If the Redsox are not playing0.250.75

1. If it is known that the Red Sox are playing, then how many bits of information are conveyed by the statement that the traffic level is LOW.
log2(1/0.001) = 9.966 bits.

2. Suppose it is known that the Red Sox are not playing. What is the entropy of the corresponding probability distribution of traffic?
0.25*log2(1/0.25) + 0.75*log2(1/0.75) = 0.811 bits.

Problem .

Consider Huffman coding over four symbols (A, B, C and D) with probabilities p(A)=1/3, p(B)=1/2, p(C)=1/12 and p(D)=1/12.

The entropy of the discrete random variable with this probability distribution was calculated to be 1.62581 bits.

We then used the Huffman algorithm to build the following variable length code:

A: 10
B: 0
C: 110
D: 111

which gave an expected encoding length of 1.666 bits/symbol, slightly higher than the entropy bound.

1. Suppose we made up a new symbol alphabet consisting of all possible pairs of the original four symbols. Enumerate the new symbols and give the probabilities associated with each new symbol.
Here are the 16 new symbols and their associated probabilities:
```AA  p(AA)=1/9
AB  p(AB)=1/6
AC  p(AC)=1/36
BA  p(BA)=1/6
BB  p(BB)=1/4
BC  p(BC)=1/24
BD  p(BD)=1/24
CA  p(CA)=1/36
CB  p(CB)=1/24
CC  p(CC)=1/144
CD  p(CD)=1/144
DA  p(DA)=1/36
DB  p(DB)=1/24
DC  p(DC)=1/144
DD  p(DD)=1/144
```

2. What is the entropy associated with the discrete random variable that has the probability distribution you gave in part (A). Is is the same, bigger, or smaller than the entropy of 1.626 calculated above? Explain.
The entropy for the new double-symbol alphabet is 3.25163 bits/double-symbol, exactly twice that of the entropy of the original alphabet. This means that we haven't changed the expected information content by choosing to encode pairs of symbols instead of one symbol at a time.

3. Derive the Huffman encoding for the new symbols and compute the expected encoding length expressed in bits/symbol. How does it compare with the 1.666 bits/symbol for the original alphabet? Note: this is tedious to do by hand -- try using the Huffman program you wrote!
```AB = 000
BA = 001
BB = 01
DB = 10000
DD = 1000100
CC = 1000101
CD = 1000110
DC = 1000111