6.034 Artificial Intelligence - Recitations, fall 2004 online slides on learning

Next: Information Theory and Decision Previous: Entropy

Information Content

Entropy is also called the information content I of an actual answer

Suppose you are sending messages to someone, could send several possible messages. How many bits are needed to distinguish which message is being sent?

If P(message) = 1.0, don't need any bits (pure node, all one class). If 10 messages, P of each is 0.1, need many bits (node with an even number of examples in each possible class). Log of a fraction is always negative, so term is multiplied by -1.

If possible answers $v_i$ have probabilities P($v_i$) then the information content I of actual answer is given by

I(P($v_1$), ... P($v_n$)) = \(\Sigma_{i=1}^n\) -P($v_i$) \(log_2\) P($v_i$)

I(1/2, 1/2) = -(1/2 $log_2$ 1/2) - (1/2 $log_2$ 1/2) = 1 bit

I(0.01, 0.99) = 0.08 bits