next up previous
Next: Using Maple for Statistical Up: 10.001: Data Visualization and Previous: Predicting Probability from the

On Sorting Data

How do we sort the sample data in ascending order? A straightforward scheme is to find the smallest of the n sample points and assign that as x1, then to find the smallest of the remaining n - 1 and assign that as x2 and repeat the procedure until we have x2. This process requires a total of (n - 1) + (n - 2) + ... + 2 + 1 = (n2 - n)/2 comparisons. So for large values of n, the algorithm scales as n2. A common n2 algorithm for sorting, which can be found in almost any elementary book on computer science or statistical analysis is the bubble sort algorithm. The algorithm is given as follows.
1. Initialize n, x1, x2,...,xn.
2. Define K = n - 1, L = 0.
3. while (L = 0)
L = 1
for j = 1, 2,..., K - 1
if ( xj > xj + 1) then swap xj and xj + 1, L=0
end for loop
K=K-1
end while loop
4. Print the sorted values of x1, x2,...,xn.

Either of the algorithms given above should not be used for large values of n since more efficient methods which scale like nln2(n) exist in literature. These algorithms are based on clever partioning of the data into suitable subsets. A number of such methods may be found in chapter 8 of the NRC book. One of the most popular methods used in practice is the quicksort method. Note that the CPU time of the quicksort method depends on the initial ordering of data. While the best case scenario is order nln2(n), the worst case could be as bad as order n2. The heapsort method is order nln2(n), irrespective of the ordering of the input data. The decision to use a quicksort over heapsort is typically made when sorting has to be done for a large number of data sets (i.e., many different samples), in which case, on the average CPU time for quicksort is typically smaller as compared to that for heapsort. In NRC, the quicksort function is called void sort(unsigned long n, float arr[]) (page 333 of the NRC book) and the heapsort function is called void hpsort(unsigned long n, float arr[]) (page 337 of the NRC book).


next up previous
Next: Using Maple for Statistical Up: 10.001: Data Visualization and Previous: Predicting Probability from the
Michael Zeltkevic
1998-04-15