Next: Information Theory and Decision Previous: Entropy

Information Content

Entropy is also called the information content I of an actual answer

Suppose you are sending messages to someone, could send several possible messages. How many bits are needed to distinguish which message is being sent?

If P(message) = 1.0, don't need any bits (pure node, all one class). If 10 messages, P of each is 0.1, need many bits (node with an even number of examples in each possible class). Log of a fraction is always negative, so term is multiplied by -1.

If possible answers have probabilities P() then the information content I of actual answer is given by

I(P(), ... P()) = $\Sigma_{i=1}^n$ -P() $log_2$ P()

I(1/2, 1/2) = -(1/2 1/2) - (1/2 1/2) = 1 bit

I(0.01, 0.99) = 0.08 bits