Next: Information Content Previous: Which Attribute Is Best?

Entropy

$\psfig{figure=figures/dt-fig-entropy-new.ps,width=2in,height=2.3in}$

is a sample of training examples
$p_{\oplus}$ is the proportion of positive examples in
$p_{\ominus}$ is the proportion of negative examples in
Entropy measures the impurity of
= expected number of bits needed to encode class ( $\oplus$ or $\ominus$ ) of randomly drawn element of (under the optimal, shortest-length code)
Information theory: optimal length code assigns $- \log_{2}p$ bits to message of probability .

So, expected number of bits to encode $\oplus$ or $\ominus$ of random element of :

$\begin{displaymath}p_{\oplus} (-\log_{2} p_{\oplus}) + p_{\ominus} (-\log_{2} p_{\ominus})\end{displaymath}$

$\begin{displaymath}Entropy(S) \equiv - p_{\oplus} \log_{2} p_{\oplus} - p_{\ominus} \log_{2} p_{\ominus}\end{displaymath}$