1.124 Lecture 10 10/12/2000

Sorting

Contents

Interesting Links

Animated demonstrations of the algorithms can be viewed at the following sites:

1. Introduction

Sorting techniques have a wide variety of applications. Computer-Aided Engineering systems often use sorting algorithms to help reason about geometric objects, process numerical data, rearrange lists, etc. In general, therefore, we will be interested in sorting a set of records containing keys, so that the keys are ordered according to some well defined ordering rule, such as numerical or alphabetical order. Often, the keys will form only a small part of the record. In such cases, it will usually be more efficient to sort a list of keys without physically rearranging the records.
 

2. Performance Criteria

There are several criteria to be used in evaluating a sorting algorithm: Examples of elementary sorting algorithms are: selection sort, insertion sort, shell sort and bubble sort. Examples of sophisticated sorting algorithms are quicksort, radix sort, heapsort and mergesort.  We will consider a selection of these algorithms which have widespread use.  In the algorithms given below, we assume that the array to be sorted is stored in the memory locations a[1],a[2],...,a[N].  The memory location a[0] is reserved for special keys called sentinels, which are described below.
 

3. Selection Sort

This ``brute force'' method is one of the simplest sorting algorithms.

Approach:

Here is the code for selection sort:

Selection.cpp

#include "Selection.h"     // Typedefs ItemType.

inline void swap(ItemType a[], int i, int j) {
    ItemType t = a[i];
    a[i] = a[j];
    a[j] = t;
}

void selection(ItemType a[], int N) {
    int i, j, min;

    for (i = 1; i < N; i++) {
        min = i;
        for (j = i+1; j <= N; j++)
            if (a[j] < a[min])
                min = j;
        swap(a,min,i);
    }
}

Selection sort is easy to implement; there is little that can go wrong with it. However, the method requires O(N2) comparisons and so it should only be used on small files. There is an important exception to this rule. When sorting files with large records and small keys, the cost of exchanging records controls the running time. In such cases, selection sort requires O(N) time since the number of exchanges is at most N.
 

4. Insertion Sort

This is another simple sorting algorithm, which is based on the principle used by card players to sort their cards.

Approach:

Insertion of an element among those previously considered consists of moving larger elements one position to the right and then inserting the element into the vacated position.

Here is the code for insertion sort:

Insertion.cpp

#include "Insertion.h"         // Typedefs ItemType.

void insertion(ItemType a[], int N) {
    int i, j;
    ItemType v;

    for (i = 2; i <= N; i++) {
        v = a[i];
        j = i;
        while (a[j-1] > v) {
            a[j] = a[j-1];
            j--;
        }
       a[j] = v;
    }
}

It is important to note that there is no test in the while loop to prevent the index j from running out of bounds. This could happen if v is smaller than a[1],a[2],...,a[i-1].  To remedy this situation, we place a sentinel key in a[0], making it at least as small as the smallest element in the array.  The use of a sentinel is more efficient than performing a test of the form while (j > 1 &&  a[j-1] > v). Insertion sort is an O(N2) method both in the average case and in the worst case.  For this reason, it is most effectively used on files with roughly N < 20.  However, in the special case of an almost sorted file, insertion sort requires only linear time.
 

5. Shell Sort

This is a simple, but powerful, extension of insertion sort, which gains speed by allowing exchanges of non-adjacent elements.

Definition:

An h-sorted file is one with the property that taking every hth element (starting anywhere) yields a sorted file.

Approach:

Each stage in the sorting process brings the elements closer to their final positions. The method derives its efficiency from the fact that insertion sort is able to exploit the order present in a partially sorted input file; input files with more order to them require a smaller number of exchanges.  It is important to choose a good sequence of increments. A commonly used sequence is (3K-1)/2,...,121,40,13,4,1, which is obtained from the recurrence hk = 3 hk+1+1.  Note that the sequence obtained by taking powers of 2 leads to bad performance because elements in odd positions are not compared with elements in even positions until the end.

Here is the complete code for shell sort:

Shell.cpp

#include "Shell.h"         // Typedefs ItemType.

void shell(ItemType a[], int N) {
    int i, j, h;
    ItemType v;

    for (h = 1; h <= N/9; h = 3*h+1);

    for (; h > 0; h /= 3)
        for (i = h+1; i <= N; i++) {
            v = a[i];
            j = i;
            while (j > h && a[j-h] > v) {
                a[j] = a[j-h];
                j -= h;
            }
            a[j] = v;
        }
}

Shell sort requires O(N3/2) operations in the worst case, which means that it can be quite effectively used even for moderately large files (say N < 5000).
 

6. Quicksort

This divide and conquer algorithm is, in the average case, the fastest known sorting algorithm for large values of N.  Quicksort is a good general purpose method in that it can be used in a variety of situations.  However, some care is required in its implementation. Since the algorithm is based on recursion, we assume that the array (or subarray) to be sorted is stored in the memory locations a[left],a[left+1],...,a[right].  In order to sort the full array, we simply initialize the algorithm with left = 1 and right = N.

Approach:

How to partition the subarray a[left],a[left+1],...,a[right]: Here is the complete code for quicksort:

Quicksort.cpp

// inline void swap() is the same as for selection sort.

void quicksort(ItemType a[], int left, int right) {
    int i, j;
    ItemType v;

    if (right > left) {
        v = a[right];
        i = left - 1;
        j = right;
        for (;;) {
            while (a[++i] < v);
            while (a[--j] > v);
            if (i >= j) break;
            swap(a,i,j);
        }
        swap(a,i,right);
        quicksort(a,left,i-1);
        quicksort(a,i+1,right);
    }
}

Note that this code requires a sentinel key in a[0] to stop the right-to-left scan in case the partitioning element is the smallest element in the file.  Quicksort requires O(N log N) operations in the average case.  However, its worst case performance is O(N2), which occurs in the case of an already sorted file!  There are a number of improvements which can be made to the basic quicksort algorithm.
 

7. Choosing a Sorting Algorithm

Table 1 summarizes the performance characteristics of some common sorting algorithms.  Shell sort is usually a good starting choice for moderately large files N < 5000, since it is easily implemented.  Bubble sort, which is included in Table 1 for comparison purposes only, is generally best avoided.  Insertion sort requires linear time for almost sorted files, while selection sort requires linear time for files with large records and small keys.  Insertion sort and selection sort should otherwise be limited to small files.  Quicksort is the method to use for very large sorting problems.  However, its performance may be significantly affected by subtle implementation errors.  Furthermore, quicksort performs badly if the file is already sorted.  Another possible disadvantage is that quicksort is not stable i.e. it does not preserve the relative order of equal keys.  All of the above sorting algorithms are in-place methods.  Quicksort requires a small amount of additional memory for the auxiliary stack.  There are a few other sorting methods which we have not considered.  Heapsort requires O(N log N) steps both in the average case and the worst case, but it is about twice as slow as quicksort on average.  Mergesort is another O(N log N) algorithm in the average and worst cases.  Mergesort is the method of choice for sorting linked lists, where sequential access is required.

Table 1: Approximate running times for various sorting algorithms

Method Number of comparisons
 Average case Worst case
Number of exchanges
Average case Worst case
Selection sort
Insertion sort
Bubble sort
Shell sort
Quicksort
N2/2
N2/4
N2/2
~N.1.25
2 N ln N 
(1.38 N log2 N)
N2/2
N2/2
N2/2
N3/2
N2/2
N
N2/8
N2/2
?
N
N
N2/4
N2/2
?
N