6.034 Artificial Intelligence - Recitations, fall 2004 online slides on learning

Next: Information Content Previous: Which Attribute Is Best?

Entropy

\psfig{figure=figures/dt-fig-entropy-new.ps,width=2in,height=2.3in}

  • $S$ is a sample of training examples
  • $p_{\oplus}$ is the proportion of positive examples in $S$
  • $p_{\ominus}$ is the proportion of negative examples in $S$
  • Entropy measures the impurity of $S$
  • $Entropy(S)$ = expected number of bits needed to encode class ($\oplus$ or $\ominus$) of randomly drawn element of $S$ (under the optimal, shortest-length code)
  • Information theory: optimal length code assigns $- \log_{2}p$ bits to message of probability $p$.

So, expected number of bits to encode $\oplus$ or $\ominus$ of random element of $S$:

\begin{displaymath}p_{\oplus} (-\log_{2} p_{\oplus}) + p_{\ominus} (-\log_{2} p_{\ominus})\end{displaymath}


\begin{displaymath}Entropy(S) \equiv - p_{\oplus} \log_{2} p_{\oplus} - p_{\ominus} \log_{2}
p_{\ominus}\end{displaymath}