next up previous
Next: Modeling the Melting Point Up: 10.001: Data Visualization and Previous: Variance, Standard Deviation and

Frequency Distribution Revisited

In section 2, we computed the number of observations of the melting point within a given interval for 7 intervals and plotted the interval frequency vs. the interval midpoint (refer to Figure 3). Such information obtained by grouping data into various intervals can be used to infer and model how the data is distributed in the entire population. In the context of our example, this means that by studying the statistical properties of the melting point data of the sample grouped into different intervals, we can gain inferences on how the melting point is distributed among the entire population of alloy parts produced.

The first step in this kind of analysis is to define the intervals and find the frequency of observation within each interval. Let's say that we chose m intervals of equal length of $ \Delta$x. Let's denote the mid-points of these intervals by Pj, for j = 1, 2, ... , m. An observation xk belongs to the interval j if

Pj - $\displaystyle \Delta$x/2 $\displaystyle \leq$ xk < Pj + $\displaystyle \Delta$x/2. (10)

Using the above criterion, we can find the interval frequency fj for j = 1, 2, ... , m. The frequency distribution plot in Figure 3 shows fj vs. Pj with m = 7. Note that the sample size n has to be equal to sum of the interval frequencies, i.e., n = $ \sum_{i=1}^{m}$fj.

Once we have collected the interval frequencies, we can compute the mean and the variance from the grouped data. We will use the subscript g to denote the statistical measures obtained from grouped data. In particular,

$\displaystyle \mu_{g}^{}$ = $\displaystyle {\frac{\sum_{j=1}^m f_j P_j}{\sum_{j=1}^m f_j}}$ = $\displaystyle {\frac{\sum_{j=1}^m f_j P_j}{n}}$      
(n - 1)$\displaystyle \sigma_{g}^{}$ = $\displaystyle \sum_{j=1}^{m}$fj(Pj-$\displaystyle \mu_{g}^{}$)2.     (11)

For instance, $ \mu_{g}^{}$ for the melting point data grouped as in Figure 3 is 320.3.

The cumulative frequency, Fj is defined as the sum of all the frequencies of the intervals $ \leq$ j. You can compute Fj recursively as Fj = Fj - 1 + fj, j = 2, 3, ... , m with F1 = f1. Note that Fm = n.



 
next up previous
Next: Modeling the Melting Point Up: 10.001: Data Visualization and Previous: Variance, Standard Deviation and
Michael Zeltkevic
1998-04-15