Predicting Probability from the Model Distribution

Next: On Sorting Data Up: Frequency Distribution Revisited Previous: Modeling the Melting Point

Predicting Probability from the Model Distribution

Once we have modeled the data using a distribution, we have at our disposal a predictive tool to evaluate the probability that a certain observation can be made between any 2 melting points x_a and x_b such that x_a < x_b. This probability, denoted by $\Pi$ (x_a, x_b), is the area under the distribution curve between x = x_a and x = x_b; i.e.,

$\displaystyle \Pi$ (x_a, x_b) = $\displaystyle \int_{x_a}^{x_b}$ N(x : $\displaystyle \mu$ , $\displaystyle \sigma$ )dx.

(16)

Note that Eq. 13 is equivalent to the statement that the probability of making an observation between - $\infty$ and $\infty$ is unity.

How do we compute the integral in Eq. 16? First of all, we would like to define a new variable

z $\displaystyle \equiv$ (x - $\displaystyle \mu$ )/ $\displaystyle \sigma$

(17)

which is independent of the units of measurement. Moreover, z = 0 for x = $\mu$ and note that z measures the how far we are away from the mean in units of the standard deviation. We now define the standard normal distribution, N_s(y : 0, 1) as

N_s(y : 0, $\displaystyle \sigma$ ) = $\displaystyle {\frac{1}{\sqrt{2\pi}}}$ exp(- y²/2).

(18)

Note that the standard normal distribution is a normal distribution with 0 mean and unit variance. We now define the standard cumulative distribution function $\Phi$ (z) based on N_s(y : 0, 1) as

$\displaystyle \Phi$ (z) = $\displaystyle {\frac{1}{\sqrt{2\pi}}}$ $\displaystyle \int_{-\infty}^{z}$ exp(- y²/2).

(19)

However, from Eq. 16, we have

$\displaystyle \Pi$ (x_a, x_b)	=	$\displaystyle {\frac{1}{\sqrt{2\pi}\sigma}}$ $\displaystyle \int_{x_a}^{x_b}$ exp $\displaystyle \left[\vphantom{-\frac{{(x-\mu)}^2}{2\sigma^2}}\right.$ - $\displaystyle {\frac{{(x-\mu)}^2}{2\sigma^2}}$ $\displaystyle \left.\vphantom{-\frac{{(x-\mu)}^2}{2\sigma^2}}\right]$ dx
		= $\displaystyle {\frac{1}{\sqrt{2\pi}}}$ $\displaystyle \int_{z_a}^{z_b}$ exp(- $\displaystyle {\frac{{z}^2}{2}}$ )dz (Note: z = $\displaystyle {\frac{x-\mu}{\sigma}}$ )
		= $\displaystyle {\frac{1}{\sqrt{2\pi}}}$ $\displaystyle \int_{-\infty}^{z_b}$ exp(- $\displaystyle {\frac{{z}^2}{2}}$ )dz - $\displaystyle {\frac{1}{\sqrt{2\pi}}}$ $\displaystyle \int_{-\infty}^{z_a}$ exp(- $\displaystyle {\frac{{z}^2}{2}}$ )dz
		= $\displaystyle \Phi$ (z_b) - $\displaystyle \Phi$ (z_a).	(20)

The values of $\Phi$ (z) can be found in standard mathematical tables. In maple, invoke the stats package using with(stats); and do ?statevalf; to get the syntax of the statevalf function which can be used to evaluate distribution values. In particular, statevalf[cdf, normald](value) will give the numerical value of $\Phi$ (z) at z = value. It is important to note that $\Phi$ (- $\infty$ ) = 0, $\Phi$ (0) = 1/2 and $\Phi$ ( $\infty$ ) = 1. Moreover,

$\displaystyle \Phi$ (- z) = 1 - $\displaystyle \Phi$ (z).

(21)

See Figure 6 in the maple worksheet figures.ms for a plot of $\Phi$ (z).

The following series formula is also used to compute z, convergence is slower as the value of z becomes large.

$\displaystyle \Phi$ (z) = $\displaystyle {\textstyle\frac{1}{2}}$ + $\displaystyle {\frac{1}{\sqrt{\pi}}}$ $\displaystyle \sum_{k=0}^{\infty}$ a_kz^{2k + 1},

(22)

where

a_k	=	$\displaystyle {\frac{1-2k}{2k(1+2k)}}$ a_{k - 1}, n $\displaystyle \geq$ 1.
a₀	=	$\displaystyle {\frac{1}{\sqrt{2}}}$ .	(23)

The following is a table which provides, z, $\Phi$ (z) accurate to 6 decimal digits and the value of k at which the series expansion of Eq. 22 was truncated to obtain that accuracy. Only z $\geq$ 0 need be computed (see Eq. 21).
0.0 0.500000 0
0.2 0.579260 4
0.4 0.655422 5
0.6 0.725747 6
0.8 0.788145 7
1.0 0.841345 8
1.2 0.884930 9
1.4 0.919243 10
1.6 0.945201 12
1.8 0.964070 13
2.0 0.977250 14
2.2 0.986097 16
2.4 0.991802 17
2.6 0.995339 19
2.8 0.997445 20
3.0 0.998650 22
3.2 0.999313 24
3.4 0.999663 26
3.6 0.999841 28
3.8 0.999928 30
4.0 0.999968 32
4.2 0.999987 35
4.4 0.999995 37
4.6 0.999998 40
4.8 0.999999 42
5.0 1.000000 45
For very large values of z (z > > 1), we may use the asymptotic relation

$\displaystyle \Phi$ (z) = 1 - exp(- z²/2)/(z $\displaystyle \sqrt{2\pi}$ ),

(24)

instead of the series expansion given in Eq. 22. For z = 4, the asymptotic result gives $\Phi$ (z) = 0.999966, this is correct to 5 decimal places.

How do we make use of the information in the table above? For instance, let's ask: what is the probability $\Pi$ (x_a, x_b) that an observation is one standard deviation within the mean? Here, x_a = $\mu$ - $\sigma$ and x_b = $\mu$ + $\sigma$ , so that z_a = - 1 and z_b = 1. So the probability (see Eq. 20) $\Pi$ ( $\mu$ - $\sigma$ , $\mu$ + $\sigma$ ) = $\Phi$ (1) - $\Phi$ (- 1) = 2 $\Phi$ (1) - 1 $\approx$ 0.6827. This implies that we expect 68.27% of the entire population of the alloy parts to have melting points between 313 and 327 if we use $\mu$ = 320 and $\sigma$ = 7. Confidence levels are simply probabilities converted into percentages. For instance, the confidence level of the interval [ $\mu$ - $\sigma$ , $\mu$ + $\sigma$ ] is 68.27% from the probability calculated above. It can be shown that the probability that an observation x falls within the range [ $\mu$ - 1.96 $\sigma$ , $\mu$ + 1.96 $\sigma$ ] is 0.95 for the normal distribution. So the confidence level for this interval is 95%.

The discrete analogue of $\Phi$ (z) is the cumulative frequency data scaled with n plotted against z_j = (P_j - $\mu$ )/ $\sigma$ . We expect F_j/n to approach $\Phi$ (z) in the limit n $\rightarrow$ $\infty$ and $\Delta$ x $\rightarrow$ 0.

Next: On Sorting Data Up: Frequency Distribution Revisited Previous: Modeling the Melting Point

Michael Zeltkevic
1998-04-15