6.034 Artificial Intelligence - Recitations, fall 2004 online slides on learning

Next: Information Gain Previous: Information Content

Information Theory and Decision Trees

What is the correct classification?

Before splitting, estimate of probabilities of possible answers calculated as proportions of positive and negative examples

If training set has $p$ positive examples and $n$ negative examples, then the information contained in a correct answer is

I((p/p+n), (n/p+n))

Splitting on a single attribute does not usually answer entire question, but it gets us closer

How much closer? (Note that the search is greedy, i.e., hill-climbing)