next up previous
Next: Predicting Probability from the Up: Frequency Distribution Revisited Previous: Frequency Distribution Revisited

Modeling the Melting Point Data: Gaussian/Normal Distribution

Even a cursory examination of Figure 3 reveals that the sample frequency distribution is approximately symmetric about a mean value of 320. Moreover, the maximum frequency occurs very close to the mean value. Furthermore, the frequency diminishes rapidly as we move either to the left or to the right of the mean value. These observations suggest the use of a bell shaped distribution to model the data. Or in mathematical terms, we seek to model the population frequency distribution using a Gaussian (normal) distribution. The Gaussian distribution is one of the most widely used probability distributions with applications not only in statistical analysis of data but in theory of probability and stochastic processes. The mathematical expression for the normal distribution is given by

N(x : $\displaystyle \mu$,$\displaystyle \sigma$) = $\displaystyle {\frac{1}{\sqrt{2\pi}\sigma}}$exp$\displaystyle \left[\vphantom{-\frac{{(x-\mu)}^2}{2\sigma^2}}\right.$ - $\displaystyle {\frac{{(x-\mu)}^2}{2\sigma^2}}$ $\displaystyle \left.\vphantom{-\frac{{(x-\mu)}^2}{2\sigma^2}}\right]$, (12)

where N(x : $ \mu$,$ \sigma$) denotes a normal or Gaussian distribution of variable x with mean $ \mu$ and standard deviation $ \sigma$. It can be shown that
$\displaystyle \int_{-\infty}^{\infty}$N(x : $\displaystyle \mu$,$\displaystyle \sigma$)dx = 1,     (13)
$\displaystyle \int_{-\infty}^{\infty}$xN(x : $\displaystyle \mu$,$\displaystyle \sigma$)dx = $\displaystyle \mu$, and      (14)
$\displaystyle \int_{-\infty}^{\infty}$(x-$\displaystyle \mu$)2N(x : $\displaystyle \mu$,$\displaystyle \sigma$)dx = $\displaystyle \sigma^{2}_{}$.     (15)

Eq. 13 is equivalent to saying that that the distribution function is normalized such that the area above the x axis and under the N(x : $ \mu$,$ \sigma$) curve is unity. Eq. 14 says that the expectation of the statistical variable is equal to the mean. Eq. 15 says that the second moment of the distribution about the mean value is equal to its variance. The points of inflection of N(x : $ \mu$,$ \sigma$) are given by x = $ \mu$ $ \pm$ $ \sigma$.

Now how can we relate the discrete frequency distribution of the melting point data to the continuous normal distribution? We can rephrase this question as: what is the appropriate frequency function which will approach N(x : $ \mu$,$ \sigma$) in the limit of the interval length $ \Delta$x $ \rightarrow$ 0 and the number of observations n $ \rightarrow$ $ \infty$? Such a function can be constructed by suitably scaling fj so that after scaling, the area under fj vs. Pj curve is unity. We can then use the scaled fj for comparisons against N(x : $ \mu$,$ \sigma$).

Now, if we plot a histogram of fj vs. Pj, the contribution to the area from interval j is fj$ \Delta$x. So the total area from m intervals is $ \Delta$x$ \sum_{j=1}^{m}$fj = n$ \Delta$x. Hence, in order to be consistent with the normalization given by Eq. 13, we should compare fj/(n$ \Delta$x) with N(x : $ \mu$,$ \sigma$). Such a comparison for the melting point data is shown in Figure 5. Note that in Figure 5, the continuous curve corresponds to n$ \Delta$xN(x : $ \mu$,$ \sigma$) = 250N(x : 320.1, 6.7) is plotted for 305 $ \leq$ x $ \leq$ 335 and the points correspond to fj for j = 1, 2, ... , 7. As we can see from Figure 5, the agreement is quite satisfactory, implying that the distribution of the melting points in the entire population of alloy parts can be modeled as a Gaussian distribution.


next up previous
Next: Predicting Probability from the Up: Frequency Distribution Revisited Previous: Frequency Distribution Revisited
Michael Zeltkevic
1998-04-15