The most commonly used measure of variation (dispersion) is the sample standard deviation, . The square of the sample standard deviation is called the sample variance, defined as2
However,(xi-)2 | = | (xi2 - 2xi + ) | |
= xi2 - 2(xi) + n | |||
= (xi2) - 2n + n | |||
= (xi2) - n. | (3) |
The advantage of Eq. 4 over Eq. 2 is that it allows for the computation of xi2 required for the evaluation of and xi required for the evaluation of in one loop, whereas Eq. 2 requires the precomputed value of before we can compute . For this reason, Eq. 4 is used often in the computations of the mean and variance.
However, if you closely examine Eq. 2 and Eq. 4, one important difference can be pointed out: Eq. 2 guarantees a non-negative variance because variance is given there as the sum of squares. This is not necessarily true of Eq. 4 where we subtract n(xi)2 from xi2. From a computational perspective, we know that this can cause difficulties for large samples prone to potential roundoff errors. So we are interested in developing an algorithm which computes (a). both the mean and the variance in the same loop and (b). variance as a sum of squares. How can this be accomplished? Well, we can resort to developing recursive relations. Applying Eq. 1 for the the first p - 1 and p data and subtracting one from the other, we get
where denotes the mean value of the first p data of the sample. We can now compute the sample mean recursively by letting = x1 and subsequently applying Eq. 5 for p = 2, 3, ... , n. We can also construct a simple recursion relation for computing by applying Eq. 4 for the first p - 1 and p data in the sample. This gives the two equations(p - 1) = (p - 2) + (p - 1) + xp2 - p, | (7) |
The coefficient of variation of the sample data, denoted by CV is defined as
Note that CV is independent of the units of measurement.