## 6.5.2 Median Problems

Let us consider an undirected network G(N, A) with n nodes. Let k be some positive integer (k = 1, 2, 3, . . .) and let us choose k distinct points on the graph G to be indicated as the set Xk = {xl, x2, . . ., xk-1, xk}. We shall then indicate by d(Xk,j) the minimum distance between any one of the points x1 Xk and the node j on G. That is,

Now, if the k points in Xk are to be the points where k facilities providing a given service will be located and if hj, the demand weight of node j, is set equal to the fraction of all calls for the service in question that originate from j (i.e., hj = 1), then finding the k-medians, X*k , of G amounts to finding the set of k locations that minimize the average travel distance to (or from) the facilities by service users. This should be clear from the definition of the function J(Xk) in (6.15), which is now nothing but an expression for the average travel distance. Note, too, that the implicit assumption in all of the above is that demands originating at any given node j will be served exclusively by the facility that is closest to j.

We can now state a most important result, known as Hakimi's theorem [HAKI 64].

Theorem: At least one set of k-medians exist solely on the nodes of G.

The practical significance of this theorem is great. It states in effect that the search for the set of the k optimal locations for the k facilities can be limited to the node set of G (i.e., to a total of n points only) instead of the infinite number of points that lie on the links of G.

The validity of the theorem is obvious for the trivial case when the required number of facilities k is greater than or equal to the number of nodes n. Then we only have to locate one facility on each node to reduce average travel distance to zero.

We shall now prove the theorem for the special case of a single facility (k = 1). A line of approach similar to the one outlined below can be used to prove the more general case (k1) (see also Problem 6.9).

Proof (For a single facility, l-median): Suppose that the optimum location for the single facility is at a point x which lies on the link (p, q) between nodes p and q. The distance d(x, j) between x and any vertex j N can then be written as

```  d(x, j) = Min (d(x, p) + d(p, j), d(x, q) + d(q,J))          (6.16)
```

That is, any node j N will be reached from x (p, q) either through p or through q. Let now P be the set of nodes that point x reaches most efficiently through node p and Q the set of nodes reached more efficiently through q (ties can be broken arbitrarily). P and Q are thus mutually exclusive sets whose union is equal to N.

Assume now without loss of generality that more users are reached through p than through q, that is

From the definition of the distance d(p, j), we have

But the term d(x, p) · is, by assumption (6.17), greater than or equal to zero. Therefore, we conclude that J(x) J(p), which contradicts the assumption that the 1-median is located at an interior point of the link (p, q). Stated differently, we have proved that we can do "at least as well,, by moving the facility from x to the node p. This also completes our proof.

Example 15: Location of a Single Medtan and of Two Medians

Consider the facility location problem of Figure 6.36, where a network model of an urbanized area is shown. Nodes A through H represent points where demands for service are generated and/or points where major roads in the area intersect. A single facility is to be located in the area and its prospective users will have to travel to the facility to partake of the service provided there. Daily demand figures for the service (in units of 1OO's) are indicated by the figures in parentheses next to the nodes where they originate. The lengths of the various road segments are also indicated (in miles). Where should the facility be located to minimize the average travel distance to it ?

Solution

Because of Hakimi's theorem there are only eight candidate points on the network for the placement of the facility; these are the eight nodes A through H. By using Algorithm 6.2 (or some other shortest-path algorithm) or, in this case by inspection, we can compute the distance (shortest-path) matrix [d (i, j)] for all pairs of nodes, i and j, of the graph. The distance matrix is given in Table 6-7.

We next compute the terms hj · d(i, j) by multiplying each column of the distance matrix by the weight of node j. The result of that operation is shown in Table 6-8. Note that the entries of the [hj · d(i, j)] matrix have very real physical meaning. For instance, the entry of row B. column D, indicates that if the facility is located at node B. users from node D must travel a total of 900 "passenger-miles" a day (= 300 persons x 3 miles) to use the facility at B. In view of this, it should now be obvious how the optimum location for the

facility can be found: by summing across the entries for each row i of the [hj · d(i, j)] matrix, we can compute the total distance traveled by users if the facility is located at row i. Normalizing the quantities hj by dividing by the total demand (= 15), we can find the average user travel distance associated with each of the eight candidate locations. This procedure is summarized in Table 6-9. The optimum location for the facility is at node C, and the associated average travel distance is 2.67 miles.

What would now happen if two facilities were desired? To solve the 2-median problem we can still take advantage of Hakimi's theorem and consider only sets of points composed of two nodes. With a total of eight nodes, there are (28) = 28 possibilities. Total (or average) distances for each combination of locations can still be obtained directly from the [hjd(i, j)] matrix of Table 6-8: demands from each node will be "assigned" to the facility closest to it (i.e., the one that requires the least amount of travel). Thus, if, for instance, the two facilities are located at nodes A and G. the amount of travel contributed by users from D is given by Min [hDd(A, D), hD · d(G, D)] = Min [6, 18] = 6, directly from the [hjd(i, j)] matrix.

We list below the total distances associated with a few of the 28 combinations of locations:

By exhaustive consideration of alI possibilities, we can reach the conclusion that the solution of the 2-median problem consists of locations at nodes D and G for a total distance of 18 units (or an average distance of 1.2 miles). Under this solution, demand from nodes A, B. C, D, and E (total of 10 units of demand) is assigned to the facility at D while demand from G and H (total of 5 units) is assigned to G. Thus, the facility at D assumes double the load of the facility at G. Note also that despite the considerable overall reduction in the average travel distance, service users from nodes B. C, and E now have to travel farther than they had to with a single facility.

From the discussion above we immediately deduce an algorithm for finding the one-median of an undirected graph G(N, A).

Single Median Algorithm (Algorithm 6.10)
 STEP 1: Obtain the minimum distance matrix for the nodes of G. STEP 2: Multiply the jth column of the minimum distance matrix by the demand weight hj (j = 1, 2, . . ., n) to obtain the matrix hj · d(i j)]. STEP 3: For each row i of the [hj · d(i j)] matrix, compute the sum of all the terms in the row. The node that corresponds to the row with the minimum sum of terms is the location for the l-median.

Algorithm 6.10 can also be used, in principle, to obtain the k-medians for any value of k 1. Only Step 3 must be modified to provide for consideration of sets of k rows (rather than of single rows) in the manner indicated for the 2-median case in our example.

Unfortunately, the combinations--and attendant required comparisons at Step 3--become too many to handle even with a computer, as soon as the number of nodes, n, and number of facility locations required, k, reach moderate size. For instance, for n = 100 and k = 5 there exist some 75,000,000 possible combinations of 5-medians, with the computation of the total distance for each requiring some 500 comparisons at Step 3. Thus, the essentially "brute-force" approach of the modified Algorithm 6.10 soon becomes infeasible for k > 1 and more sophisticated approaches must be sought.

Several exact algorithms have been proposed for the k-median problem [REVE 70, GARF 74]. Basically, these algorithms attempt to solve efficiently integer programming formulations of the problem. Better known is a conceptually simple heuristic algorithm TEIT 68], which, however, does not always terminate with the optimum k-median solution. We describe below an improved version of that algorithm. It begins by finding the l-median of the network and then increases the number of medians in steps of one at a time, until they become equal to the required number, k (k > 1). Because of Hakimi's theorem we shall only be concerned with locations on nodes. We shall use S to indicate the set of nodes where medians have been (tentatively) located at any given stage in the execution of the algorithm and m to indicate the number of nodes in S. During the execution of the algorithm m will increase from 1 to k.

Multimedian Heuristic Algorithm (Algorithm 6.11)
 STEP 1 Let m = 1. Find the l-median of the network G(N, A) using Algorithm 6.10. Let the l-median be at node i. Set S = {i}. STEP 2: (Facility Addition): Add a new facility to the current membership of the set S by choosing that location among the nodes in ADS, the nodes which are not in S.which produces the maximum possible improvement in the objective function as the number of medians increases by 1. Let m = m + 1. STEP 3: (Solution Improvement): Attempt to improve the objective function by substituting in a systematic way, one at a time, one of the nodes in S with a node that is in N-S. Every time an improved solution is obtained, use this as the new "incumbent" solution, S, and repeat Step 3. When all possible single-node substitutions for a set S have been attempted without improving the objective function, go to Step 4. STEP 4: If m = k, stop; otherwise, return to Step 2.

Example 15 (continued)

We now apply Algorithm 6.11 to the 2-median problem on the network of Example 15. In Step 1 we find the 1-median at C and thus set S = {C}. In Step 2, we must compare the value of the objective function for the sets of facility locations {A, C}, {B. C}, {C, D}, {C, E}, {C, F}, {C, G}, and {C, H}. Working with Table 6.8, the respective values of the objective function are found to be 31, 38, 34, 37, 30, 20, and 29. Thus, 5 = {C, G}. In Step 3, we now compare the incumbent solution successively with {A, Gl, {s, G}, {D, G}. We obtain our first improvement with {D, G} (objective function = 18). So S = {D, G} becomes the new incumbent solution and Step 3 is repeated. We compare the new solution to {A, G}, {B. G}, {C, G}, {E, G}, . . . , {H. G}, {D, A}, {D, B}, . . . , {D, H} without obtaining any further improvement in the objective function. (Note the meaning of attempting "all possible singlenode substitutions.") Since, in Step 4, m = 2 = k, the algorithm stops with the solution S = {D, G}. In this case this also happens to be the optimum 2-median solution.

Algorithm 6.11 is typical of a number of heuristic network algorithms that use the substitution method to improve initial solutions. For example, one of the best-known algorithms for the traveling salesman problem [LIN 73] begins with an initial tour and improves that tour by substituting one link of the initial tour at a time with another link which was not in the initial tour. The solutions obtained from such algorithms are sometimes referred to as 1-optimal, because they cannot be improved by replacing any single member (node, link, or whatever) of the final solution set. Algorithm 6.11 could be easily modified to be 2-optimal (or 3-optimal, etc.) by permitting the substitution in Step 3 of up to two (or three, etc.) nodes of S at a time (instead of exactly one) with nodes