Next: Using Maple for Statistical Up: 10.001: Data Visualization and Previous: Predicting Probability from the

On Sorting Data

How do we sort the sample data in ascending order? A straightforward scheme is to find the smallest of the n sample points and assign that as x₁, then to find the smallest of the remaining n - 1 and assign that as x₂ and repeat the procedure until we have x₂. This process requires a total of (n - 1) + (n - 2) + ^... + 2 + 1 = (n² - n)/2 comparisons. So for large values of n, the algorithm scales as n². A common n² algorithm for sorting, which can be found in almost any elementary book on computer science or statistical analysis is the bubble sort algorithm. The algorithm is given as follows.
1. Initialize n, x₁, x₂,...,x_n.
2. Define K = n - 1, L = 0.
3. while (L = 0)
L = 1
for j = 1, 2,..., K - 1
if ( x_j > x_{j + 1}) then swap x_j and x_{j + 1}, L=0
end for loop
K=K-1
end while loop
4. Print the sorted values of x₁, x₂,...,x_n.

Either of the algorithms given above should not be used for large values of n since more efficient methods which scale like nln₂(n) exist in literature. These algorithms are based on clever partioning of the data into suitable subsets. A number of such methods may be found in chapter 8 of the NRC book. One of the most popular methods used in practice is the quicksort method. Note that the CPU time of the quicksort method depends on the initial ordering of data. While the best case scenario is order nln₂(n), the worst case could be as bad as order n². The heapsort method is order nln₂(n), irrespective of the ordering of the input data. The decision to use a quicksort over heapsort is typically made when sorting has to be done for a large number of data sets (i.e., many different samples), in which case, on the average CPU time for quicksort is typically smaller as compared to that for heapsort. In NRC, the quicksort function is called void sort(unsigned long n, float arr[]) (page 333 of the NRC book) and the heapsort function is called void hpsort(unsigned long n, float arr[]) (page 337 of the NRC book).

Next: Using Maple for Statistical Up: 10.001: Data Visualization and Previous: Predicting Probability from the

Michael Zeltkevic
1998-04-15