3.1 FUNCTIONS OF RANDOM VARIABLES:

DERIVED DISTRIBUTIONS

Often when examining a system we know by hypothesis or measurement the probability law of one or more random variables, and wish to obtain the probability laws of other random variables that can be expressed in terms of the original random variables. The random variables in the second set are functions of the random variables in the first set. We call this a problem of derived distributions, since we must derive the joint probability distribution(s) for the random variables in the second set. Derived distribution problems can arise with discrete, continuous, or mixed random variables.

There are many special techniques for deriving distributions, but we will focus on a "never-fail" method. Virtually all of the work associated with this method occurs in the joint sample space of the original random variables; the never-fail method is simply a systematic procedure for carrying out Step 4 ("working in the sample space") in a probabilistic modeling analysis.

Suppose that the original set of random variables is given by {X1, X2, ..., XN} with joint cdf
FX1, X2,...,XN(·). Suppose that there are M random variables Y1, Y2, ..., YM, each of which can be expressed as a function of X1, X2, ..., XN, namely Yi = gi(X1, X2, ..., XN), i = 1, 2, ..., M. Then the never-fail method, called the cumulative distribution method, allows computation of the joint cumulative distribution function for the Yi's,



as follows:

a. Identify the set of points in the original (X1, X2, ..., XN) sample space that corresponds to the joint event



b. For each set of values for the y1's, [y1, y2, . . . , yM], determine by summation or integration the probability in the (X1, X2,... XM,) sample space of this joint event, thereby obtaining FY1,Y2... Ym(y1,y2,...yM) - < y1,y2,...yM < +

If the random variables are continuous, we can find the joint pdf for {Y1, Y2,... YM} by taking partial derivatives of FY1,Y2,...YM (·) with respect to each of its arguments,



If they are discrete, the pmf is found simply by using the cdf and subtracting appropriate successive values.

While the method described in its full generality may appear intimidating, applying it carefully in a step-by-step manner makes problems much easier to solve. Fortunately, for many problems of interest the number of variables involved i s small, often with neither M nor N exceeding 2. Gaining proficiency in this aspect of probabilistic modeling seems to require study of numerous examples, to uncover potential pitfalls that await the unwary analyst. Thus, we will analyze many examples, most of which are of independent interest in the analysis of urban service systems. Continuous random variables appear to give the greatest difficulty to those first learning use of the method, and thus our focus will be on continuous random variables. Examples involving discrete random variables are given in the problems. (See Problem 3.2 for strictly discrete random variables and Problems 3.24 and 3.30 for "mixed" random variables.)

Example 1: Response Distance of an Ambulette

This first example will provide a framework for demonstrating several characteristics of "derived distribution" problems. Suppose that a public safety vehicle travels back and forth along a straight highway, the traveling perhaps to find motorists in need of assistance. Also, along this highway accidents can occur that create a need for on-scene assistance by the vehicle. The vehicle is dispatched by radio to these accidents. Because of its limited on-board emergency medical equipment, we call the vehicle an ambulette. We are interested in determining the probability law of the travel distance for the ambulette to reach a random medical emergency.

Solution

Following the general discussion above, a derived distribution problem is like any other probabilistic modeling problem; it requires that we do four things to model the experiment:

STEP 1: Define the random variables of interest.
STEP 2: Identify the joint sample space.
STEP3: Determine the joint probability distribution over the sample space.
STEP 4: Work within the sample space to determine the answers to any questions about the experiment.
As discussed above, the activity specific to derived distributions (functions of random variables) occurs in Step 4.

1. Random variables. Suppose that the highway is of unit length. Then the two key random variables would be

X1 = location of the medical emergency, 0 X1 I

X2 = location of the ambulette at the moment of dispatch, 0 X2

Later, when we are interested in travel distance, assuming U-turns are possible and permissible everywhere, the travel distance D can be expressed as a function of X1 and X2, D = |X1 - X2|

2. Joint sample space. The joint sample space is the unit square in the positive quadrant (0 X1 1, 0 X2 1).

3. Joint probability distribution. We will assume that the locations of the ambulette and the medical emergency are uniformly, independently distributed over the highway. In practice, the three assumptions entailed in such a statement would have to be argued for plausibility and measurements might have to be taken. Naturally, the analysis could also proceed with an alternative set of assumptions. Since we are now dealing with strictly continuous random variables, we will work with the joint probability density function, which is





4. Work in the sample space. This is the point at which the never-fail method for deriving distributions comes into play. We want the probability law of





Here, in our general notation, N = 2 and M = 1 and we are confronted with what is sometimes called a 2-to-1 transformation.

To apply the never-fail method for finding the cdf of D, FD(y), we first locate the region in the (XI, X2) sample space corresponding to the event (D < y). Formally, the steps are written as follows:





To remove the absolute value operator, we consider two cases separately: case 1: X1 X2; case 2: X1 < X2 . For the first case, D = X1 - X2 and experimental values x1 and x2 of X1 and X2, respectively, must lie between the line x2 = x1 and x2 = xI - y (Figure 3.1). For the second case, D = X2 - X1, and experimental values of X1 and X2 must lie between the line x2 = x1 and x2 = xI + y. Consideration of these two cases gives rise to the shaded region in the sample space in Figure 3.1. Once we have determined such a region, we have identified the set of points corresponding to the event of interest: [D < y), thereby completing step a of the never-fail method. This is often the most difficult part of a derived distribution problem. Note that determination of this region in no way depended on the joint pdf for X1 and X2; thus, the "work" invested to this point could be applied to several alternative models, each with its own joint pdf for X1 and X2.



Step b of the never-fail method requires that we integrate fx1,x2,(·) over the set of points in the shaded region to obtain FD(y). Since the joint X1, X2 pdf is uniform over the unit square, we can perform the integration by computing areas in the sample space. (Conceptually, each area is multiplied by "l," the height of the pdf at that point, to yield a probability measured as a volume.) By computing areas of the triangles not in the shaded region,





we have now completed step b of the never-fail method and we are "done." [What do we know about FD(-2) or FD?]

Should we desire the pdf of D, we differentiate, obtaining





From the pdf (or cdf) we can determine anything that is desired concerning D. For instance, the expected value (or mean value) of D is



These results will be of use in our further work.

A system administrator may be interested in knowing the effects on travel distance of prepositioning the ambulette at the center of the interval depicting the highway, thus fixing X2 Then the joint sample space is the straight line indicated in Figure 3.2. If the new travel distance is D'=|X1-1/2|, the



region for which (D' y) is the line segment of length 2y centered at X1=1/2. Integrating the (uniform) pdf of X1, we have FD'(y) =P{D' y)= P{ | X - 1/2 | y)= 2y (O y 1/2). Thus, the pdf of D'is



How could this result also be obtained by inspection? The mean and variance are



Thus, a change in deployment policy resulting in an ambulette prepositioned at the center of its service area rather than randomly patrolling its service area reduces mean travel distance by 25 percent, the variance of the travel distance by 62.5 percent, and, perhaps important in "worst-case" analyses, the maximum possible travel distance by 50 percent.

Question: How would one determine (or estimate) the joint distribution function for X1 and X2 in practice?

Further work: Problems 3.2-3.4.

Extension: Scaling

We often select the scale of a probabilistic modeling problem for analytical convenience. For instance, if the length of highway analyzed in Example I had been 13.72 kilometers, the factor of 13.72 would have occurred in numerous places (making the analysis obviously less attractive). Thus, after performing the analysis for a conveniently scaled problem, we often rescale it to suit the real-world situation at hand. Scaling can also occur when switching systems of measurement, say from British units to metric units.

Suppose that we have derived the probability law for W, given one scale, and we wish to find the moments and the probability law of



In words, multiplying a random variable by a constant results in its variance being multiplied by the square of that constant.

We can also derive the probability law of V (assumed to be continuous) using the never-fail method. The analysis proceeds as follows:





To proceed further, we must distinguish two cases: case 1: a>0; case2;a<0.



These equations constitute the answer to our problem. For instance, in the ambulette example, if a = 13.72 kilometers and b = 71.09 kilometers, we would be modeling a 13.72-kilometer stretch of highway starting 71.09 kilometers from the origin. Returning to the patrolling ambulette example, the cdf for X1 becomes





You might find it helpful to sketch several different applications of this result.

Exercise 3.1: Rectangular Response Area Suppose that we have an X0-byY0 rectangular response area for the ambulette (Figure 3.3), with sides of the rectangle parallel to the coordinate axes. The location of the medical emergency (X1, Y1) and of the ambulette (X2, Y2) are independently uniformly distributed over the response area. Travel distance occurs according to the "right-angle" metric,

D=|X1 -X2| + |Y1 - Y2| (3.11)







Example 1: Revisited (Min and Max)

Suppose we are interested in the coordinates that determine a dispatch incident, X1 and X2, without regard to which location represents the ambulette and which the medical emergency. Instead, we may be concerned with the rightmost coordinate R and the leftmost coordinate L. For instance all points between R and L may be exposed to siren and lights as the ambulette passes at high speed. Thus, the joint probability law of R and L would be of interest. We will ignore scaling and assume that all locations, as before, occur in the interval [0, 1].

Solution

Since we have already performed Steps 1-3 in describing the experiment, we are ready to go to Step 4 (work in the sample space) and employ the neverfail method. The random variables that are functions of the original random variables are





We wish to derive the joint probability law for R and L. This is sometimes called an N = 2-to-M = 2 transformation. To execute step a of the never fail method, we proceed formally as follows:





To proceed from here, we consider separately each of the two events in braces and "merge" these later by intersection.That is, we can write



To determine the set of points in the (X1, X2) rample space corresponding to Max (X1, X2) r, we again consider two cases: case 1: X1 X2; case 2: X1 X2. For case 1, Max (X1, X2) = X1 and the event Max (X1, X2) r corresponds to the set of points to the left of the line x1 = r (Figure 3.4). Similarly, for case 2, Max (X1, X2) = X2 and the event Max (X1, X2) r corresponds to the set of points below the line X2 = r. Combining these two cases, the event Max (X1, X2) r corresponds to the square of area r2 shown in Figure 3.4.

Proceeding in a similar manner for Min (X1, X2) 1, we again consider case 1: X1 X2, and case 2: X1 X2. For case 1, Min (X1, X2) = X2 and the event Min (X1, X2) 1 corresponds to the set of points below the line x2 = 1 (Figure 3.4). For case 2, Min (X1, X2) = X1 and the event Min (X1, X2) 1 corresponds to the set of points to the left of the line x1 = 1. Combining these two cases, the event Min (X1, X2) 1 corresponds to the L-shaped region shown in Figure 3.4.



The intersection of the two events found above yields the event of interest, {R r, L l}, shown in the crosshatched region in Figure 3.4. We have now completed step a of the never-fail method,

To carry out step b all we need to do is to integrate the joint pdf fx1, x2(·) over the region (event) found in Step 1. Again, because of the special nature of this sample space and its probability assignment, we can do this by working directly with areas in the sample space. By computing the relevant areas, we obtain



The joint R, L sample space is shown in Figure 3.5. The joint pdf of R, L over this triangular region is uniform. Does this make sense intuitively?

Example 2: Travel Time

Suppose that it is not travel distance we are interested in but rather travel time. If we define random variables




time is related to distance and speed by the familiar equation



Solution

In general, to obtain the pdf of T we would require the joint pdf of D and S, say fD, S(x, s). The never-fail method would proceed as follows:



The event corresponding to [D tS) in the (D, S) sample space is shown in Figure 3.6. In principle, all we need do is integrate the joint D, S pdf over this region for each value of t to obtain the cdf for T, FT(t). 1



As a simple example, suppose that the speed of response could assume only two values, S = 1 or S = 2, with equal probability. Assume that distance is distributed as the ambulette response distance of Example 1, independently of the speed of response. Then



This formidable-looking expression represents the pdf of two random variables, one continuous and the other discrete. As long as we keep in mind that pdf's have no probabilistic meaning until we integrate them and that the integration properties of impulses are well defined, we will be in fine shape. (Recall Problem 2.2.)

The joint (D, S) sample space is shown in Figure 3.7. We now proceed with the never-fail method.



Examining Figure 3.7, we see that the straight line x = ts intersects both "lines" of the sample space for 0 < t < 1/2. So, for those values of t, we have



The "1/2"'s arise from integrating left to right across the impulses; the FD(·) terms arise from integrating from x = 0 to x = ts at s = 1 and s = 2. Since from Example 1, (3.1), we know that





Once t exceeds 1/2 in value, the sweep of the line x = ts no longer picks up additional probability from the "line impulse" at s = 2. So, for 1/2 < t 1, FT(t) =1/2[1 - (1 - t)2] + 1/2. Thus, combining results, the answer to our problem is





This pdf is sketched in Figure 3.8. Note the discontinuity in slope at t=1/2. This is not unusual in practice; in fact, one often comes across problems in which the derived pdf is discontinuous (in value) at one or more points. Points of discontinuity, either in value or slope, usually correspond to "switchover points" in the original sample space in which the summation or integral for accumulating probability for the cdf switches over to some new functional form. Switchovers often occur when the region of accumulated probability changes in geometric form, such as occurred at t=1/2 in the example.

While we have completed our derived distribution work on this problem, there is one additional issue that we wish to address and that deals with expected values of random variables. Here the expected value of T is



We may wish to calculate the expected value simply by working in the (D, S) sample space. Because of independence, if T = h1(D)h2(S), then



as calculated previously, This is an illustration of the following general principle:

If one only desires expected values and not the complete probability law of a function of random variables, it is usually computationally easier to work directly in the original sample space to compute the expected values.

There is a second general principle we can illustrate with this example. When asked to calculate E[T], one may be tempted to say that



Clearly, this is not correct, the answer being about 11 percent less than the correct answer. The error lies in assuming that E[1/S] = 1/E[S].

In general, the expected value of a function of a random variable is not equal to the function evaluated at the expected value of the random variable.

In this case one can prove mathematically that for any nonnegative random variable S,



Hence, using (E[D]/E[S]) to estimate E[T] in such a case results in an optimistically low estimate of average travel time. In a practical sense these relations imply that an urban service agency cannot infer that, say, a 20-mile/hr average response speed and a 1-mile average travel distance imply a 3-minute average travel time. On the contrary, the average inverse speed could be, say, 0.10 hour/mile; in such a case if travel distance and travel speed are independent, the average travel time is 6 minutes, not 3 minutes.

Further work: Problem 3.5.

Example 3: Rayleigh Distribution

To this point our derived distribution examples have dealt with sample spaces in which all random variables had finite maximum and minimum values. This is not a necessary requirement, and many derived distribution problems, such as the case considered here, allow one or more random variables to assume infinitely large (positive or negative) values.

Suppose an urban vehicle is located at (X0, Y0). An automatic vehicle location (AVL) system utilizes one of the several available technologiesz to estimate the location of the vehicle. Such an application is relevant in police departments, taxicab services, maintenance services, and numerous other urban services. Suppose that the estimated position of the vehicle is given by

X = X0 + Xe

Y = Y0 + Ye
where (X0, Y0) represent the true position coordinates of the vehicle and (Xe, Ye) are the additive error terms due to imperfect resolution. For certain AVL technologies it makes sense to assume that Xe and Ye are independent, zero-mean Gaussian random variables:



where the standard deviation specifies the resolution of the system. It now makes sense to examine properties of the "radius of error"



Solution

To derive the probability law of R we work in the (Xe, Ye) sample space, which is the entire plane (Figure 3.9), and utilize the joint (Xe, Ye) pdf, which is (by independence)







Because of the circular symmetry of the situation, we find it easier to evaluate this integral by changing to polar coordinates and , where



These relationships are shown in Figure 3.9. Since the infinitesimal area to be integrated changes from dx dy to d d, we can write



Carrying out the final integration, we find that



Notice that this pdf behaves as we might expect intuitively: it starts at zero at r = 0 and grows monotonically to a maximum (which occurs at r =) and then decreases monotonically in an exponential way according to r2

Among other applications, the Rayleigh probability law arises in physics in various scattering experiments and in communication theory in the modeling of noise over a communication channel. We have now seen how it arises as a derived distribution in an urban vehicle location context.

There is an alternative way of deriving the Rayleigh pdf directly without first finding the cdf. The method is useful in other applications, as well, in which it is easy to make infinitesimal probability arguments. However, when in doubt, we always prefer to resort to the never-fail cdf method. The direct method proceeds as follows : since a pdf has a probability meaning only if it is integrated, we "integrate" fR(r) over the infinitesimal interval [r, r + dr),



Again because of circular symmetry, we change to polar coordinates and 6, with = r and d = dr, thereby obtaining



as previously derived. We used such an infinitesimal argument when showing in Section 2.12 that the Ith-order interarrival time of a Poisson process has an Ith-order Erlang pdf. However, again we caution those computing derived distributions that this "infinitesimal" method for finding the pdf directly is fraught with potential pitfalls and difficulties for all but the simplest problems. Thus, the never-fail cdf method remains our primary tool for deriving distributions.

Further work on A VL position estimation errors: Problems 3.6 and 3.7.

Example 4: Ratio of Right Angle to Euclidean Distance Metrics

As another example of deriving distributions of random variables, we consider a problem that arises in transportation systems (e.g., "dial-a-ride" systems, taxicab systems), emergency services (fire, police, and ambulance), and other municipal systems having mobile units. The problem deals with the "penalty" in travel distance incurred by a mobile unit while traveling a grid of streets, compared to a helicopter or other unit that could travel "as the crow flies."

If the mobile unit is located at (x1, y1) and is traveling along a shortestdistance path to (x2, y2) perhaps to pick up a passenger, then the right-angle distance between the points is

d = |x1 - x2| + |y1 - y2|


If street directions are parallel to the coordinate axes, the right-angle distance (also called Manhattan, metropolitan, or rectangular distance) is a good approximation for the actual travel distance covered. 3

Of interest in designing computer dispatching algorithms and in developing planning models, the ratio of the right angle to the Euclidean distance provides insight as to the extra distance traveled because of the requirement of driving on streets. For instance, if one knew the average value of this ratio, then in a computer dispatching algorithm it might be acceptable to estimate the travel distance as the product of this average value and the Euclidean distance, the latter being obtained easily from a file of (x, y) coordinates.

Consider two points (X1, Y1) and (X2, Y2), corresponding to the trip origin and destination, respectively, defined relative to any fixed coordinate system. Let (0 /2) be the angle at which the directions of travel are rotated with respect to the straight line connecting the two points (see Figure 3.10). Given , the right-angle travel distance between (X1, Y1) and X1, Y1) is







We wish to derive the cdf of R using the never-fail method, making reasonable assumptions about the probabilistic behavior of .

Solution

Here we are deriving the distribution of one continuous random variable which is expressed as a function of another continuous random variable (i.e., a "one-to-one" transformation). The cdf of R is



The event corresponding to (R r) in the sample space is shown in Figure 3.11. Now in a large, uniform city it makes sense to assume that is uniformly distributed over [0, /2]. (Why?) We call this an isotropy assumption, meaning sameness regardless of direction. Given the isotropy assumption, we can integrate the pdf of over the event indicated in Figure 3.11 to obtain







Thus, "on the average" the mobile unit travels about 1.273 times the Euclidean distance (given the model assumptions). Since = 0.0155, the ratio R/E[R], the coefficient of variation, is only 0.098, meaning that the estimate of 4/ for E[R] is quite robust. A reasonable "test" of the right-angle distance metric would be to compare the empirical distribution of ratios of recorded travel distances and corresponding Euclidean distances to FR(·) and to compare the empirically found average R to 1.273.

Further work: Problem 3.8 {deriving E[R] and without FR(·)}; Problems 3.9 and 3.10 (alternatives to the isotropy assumption).

Example 5: Quantization Model

As a final detailed example of a derived distribution problem, we consider a situation in which two continuous random variables give rise to one discrete random variable. This 2-to-l transformation arises due to quantization of odometer readings in urban vehicles. The same analysis applies in other quantization settings, for instance in cases where successive event times are quantized.

Assume that we are running an experiment to estimate the distribution of distance traveled by taxicabs, where distance

D miles traveled from the moment of dispatch to arrival at the address of the caller

All we have available experimentally are recorded travel distances, which are quantized as 0 miles, I mile, 2 miles, and so on. We wish to examine the quantitative effects of such truncation. Quite clearly, the same model could be used for studying response distances of emergency vehicles, "paid" trips of taxicabs, trips of dial-a-ride vehicles, etc.

For a journey of length D, the recorded travel distance equals the sum of D and the accumulated odometer mileage at the moment of dispatch since the last odometer reading change, the sum truncated to the largest integer not exceeding the sum. For instance, if the vehicle had traveled 0.9 mile since the last reading change and then traveled 1.2 miles to the address of the caller (following dispatch), the recorded mileage would be the largest integer not exceeding (0.9 + 1.2) = 2.1, which is 2 (miles). If, however, the noninteger accumulated odometer mileage at the moment of dispatch had been 0.6 rather than 0.9, the recorded mileage would be the largest integer not exceeding (0.6 + 1.2) = 1.8, which is I mile. In the first case, the odometer's mileage reading had changed twice; in the second, once. As examples will clearly demonstrate, the recorded travel distance can either underestimate or overestimate the actual travel distance by as much as I mile.

Solution

Random Variables

There are two key random variables that give rise to the quantized distance random variable:

D actual travel distance

accumulated noninteger odometer mileage at the moment of dispatch (a random variable distributed over [0, 1))

If we let the quantized distance random variable be

K recorded mileage for the journey

Then K is a function of D and :



Here we have a discrete random variable expressed as a function of two continuous random variables. If we have the joint probability law for D and ), we would like the probability law for K.

Joint Sample Space

The (D, ) sample space is the infinite strip of width 1 (0 < D < , 0 1), shown in Figure 3.12. Without yet assigning a probability law over this sample space, we have performed in Figure 3.12 the "work" required to find the sets of points in the sample space that give rise to different values



of the random variable K. We illustrate the derivation of one of the "45° lines" partitioning the sample space. Suppose that the experimental value for D lies between 1 and 2 (i.e., 1 d 2). Then, for "sufficiently small" , K will equal 1; otherwise, K will equal 2. The switch from K = 1 to K = 2 will occur at the point at which d + = 2. Thus, the switch occurs along the line



Joint Probability Distribution

Without knowing the exact distribution for D, we can make some further progress in our analysis of the effects of quantization. From physical considerations, the following assumptions seem reasonable:

  1. The random variables D and are independent.

  2. is uniformly distributed over [0,1]. (Why?)

Thus, we will limit our knowledge of the joint (D,)pdf to say that it takes the following form:



Working in the Joint Sample Space

Since entire subregions of the (D, ) sample space give rise to exactly one value of K, we can deal directly with the pmf for K, not the cdf. Given the assumptions regarding fD, (d,) above, if the cdf for D is known, say FD(·), the probability mass function for K is readily computed:



Thus, any statistical procedure using experimental data to estimate E[K] should also yield an (unbiased) estimate of E[D]. For such a procedure to remain unbiased, it is necessary that zero-mileage journeys be recorded and used in the statistical tabulations.

Question: Given the foregoing analysis, can one lump together recorded mileages quantized in tenths of miles with those quantized in miles?

Furtherwork: Problem 3.11 [for a proof of (3.31)]; Problem 3.12 (for an application of these ideas to time measurements).

1 Typical empirical relationships found among speed, distance, and time are described later in this chapter.

2 See, for example, R. C. Larson, K. W. Colton, and G. C. Larson, "Evaluating an Implemented AVM System: The St. Louis Experience (Phase I)," Public Systems Evaluation, Inc., Cambridge, Mass., 1976.

3 See Problems 3.24 and 3.25 for realistic variations to the right-angle distance (due to discreteness of streets and one-way streets).