Next: Modeling the Melting Point Up: 10.001: Data Visualization and Previous: Variance, Standard Deviation and

Frequency Distribution Revisited

In section 2, we computed the number of observations of the melting point within a given interval for 7 intervals and plotted the interval frequency vs. the interval midpoint (refer to Figure 3). Such information obtained by grouping data into various intervals can be used to infer and model how the data is distributed in the entire population. In the context of our example, this means that by studying the statistical properties of the melting point data of the sample grouped into different intervals, we can gain inferences on how the melting point is distributed among the entire population of alloy parts produced.

The first step in this kind of analysis is to define the intervals and find the frequency of observation within each interval. Let's say that we chose m intervals of equal length of $\Delta$ x. Let's denote the mid-points of these intervals by P_j, for j = 1, 2,^..., m. An observation x_k belongs to the interval j if

P_j - $\displaystyle \Delta$ x/2 $\displaystyle \leq$ x_k < P_j + $\displaystyle \Delta$ x/2.

(10)

Using the above criterion, we can find the interval frequency f_j for j = 1, 2,^..., m. The frequency distribution plot in Figure 3 shows f_j vs. P_j with m = 7. Note that the sample size n has to be equal to sum of the interval frequencies, i.e., n = $\sum_{i=1}^{m}$ f_j.

Once we have collected the interval frequencies, we can compute the mean and the variance from the grouped data. We will use the subscript g to denote the statistical measures obtained from grouped data. In particular,

$\displaystyle \mu_{g}^{}$ = $\displaystyle {\frac{\sum_{j=1}^m f_j P_j}{\sum_{j=1}^m f_j}}$ = $\displaystyle {\frac{\sum_{j=1}^m f_j P_j}{n}}$
(n - 1) $\displaystyle \sigma_{g}^{}$ = $\displaystyle \sum_{j=1}^{m}$ f_j(P_j- $\displaystyle \mu_{g}^{}$ )².			(11)

For instance, $\mu_{g}^{}$ for the melting point data grouped as in Figure 3 is 320.3.

The cumulative frequency, F_j is defined as the sum of all the frequencies of the intervals $\leq$ j. You can compute F_j recursively as F_j = F_{j - 1} + f_j, j = 2, 3,^..., m with F₁ = f₁. Note that F_m = n.

Next: Modeling the Melting Point Up: 10.001: Data Visualization and Previous: Variance, Standard Deviation and

Michael Zeltkevic
1998-04-15