by William Shoaff with lots of help
You can download a postscript version of this file (which is prettier) at
If you've studied the course notes on recursion, you've seen the mergesort and quicksort algorithms. Mergesort sorts by insertion, while quicksort sorts by exchange. These, along with selection other general sorting techniques that have many implementations.
We want to describe some algorithms belonging to each time and determine their time and space complexities. Before we begin, let's list a few ideas and terms that should be known.
=
public class Record {
Key key;
Data satellite;
}
We will look at straight insertion sort, binary insertion, and Shell's sort as examples of insertion sorting. Don Knuth [3] text on searching and sorting provides an in-depth coverage of these algorithms.
Given an array
k1, ..., kn - 1 of n - 1 keys
we want to rearrange the items so that they are in ascending order
Assume a sentinel key
k0 = -
that is smaller than all other elements in
the array.
Assume that for some j
1
The colon below separates the already sorted keys from the key to be
inserted. The
indicates where this key should be inserted.
Note the sample array is indexed 0 through 16 to hold the 16 values
and the sentinel.
[[Straight Insertion Sort]]=
public void insertionSort (Record[] record) {
for (int i = 2; i < record.length; i++) {
Record r = record[i];
int key = record[i].key;
int j = i;
while (record[j-1].key > key) {
record[j] = record[j-1];
--j;
}
record[j] = r;
}
}
It should be clear that insertionSort() only uses a few extra registers and so has constant space complexity S(n) = O(1). The running time T(n) is also easy to determine, but this is our first while loop to analyze.
The comparison in the while loop Boolean condition
may execute only once (when key[i-1] = key[j-1]
v = key[i]),
or as many as i times
(when key[i-1]
key[i-2]
...
key[1] > v =
key[i] and key[0] < v).
In the best case the data is already sorted in ascending order, and insertion sort will execute n - 2 comparisons (one for each i = 2 to key.length - 1 = n - 1). It will also execute 2(n - 2) record assignments: r = record[i] and record[j] = r. Thus,
In the worst case the data is sorted in descending order, and the while loop executes i times for each i. Thus, the number of comparisons is
- 1 = O(n2),
+ 2n - 5 = O(n2).
Now let's try something new and reason about the average case complexity of straight insertion sort. You may want to take a detour here and read some general remarks on how one goes about determining average case complexities.
For the average case time complexity, we need the probability (P(j)) that j compares are made in the while test where j can take any of the values from j = 1 to j = i. Recall that k0, k1,..., ki - 1 are in sorted ascending order when we are inserting ki into the list.
=
=
Thus, the average number of comparisons is
| = | |||
| = |
- |
When inserting the ith record in straight insertion sort key kiis compared with about
i of the previous sorted records.
From the study of binary search techniques
we know that only about lg i compares
need to be made to determine where to insert an item into a sorted list.
Binary insertion was mentioned by John Mauchly in 1946 in the first
published discussion of computer sorting.
Another variant of insertion sorting was proposed by Donald Shell in 1959 that allows insertion of items that are far apart with few comparisons. The idea is to pick an increment, call it h, and rearrange the file of records so that every hth element is correctly sorted. Then decrease the increment h and repeat. This continues until h = 1. The diminishing sequence of increments
When the increment is
The code below comes from Kernighan and Ritchie [2].
We will now study exchange sorts that transpose records when they
are found to be out of order.
We will study bubblesort and mention
cocktail shaker sort.
Neither of these is particularly efficient, but bubble sort does
provide an interesting analysis.
So you do not go away thinking that exchange sorts are inefficient,
recall that quicksort is an exchange sort, radix-exchange sorting
is very efficient,
and there is an interesting parallel exchange sort, known as Batchler's
method.
Bubblesort repeatedly passes through the file exchanging adjacent records
if necessary; when no exchanges are needed the file is sorted.
That is, key k0 is compared with k1 and they are exchanged
if out of order. Then do the same with k1 and k2, then k2and k3, etc. Eventually, the largest item will ``bubble''
to the end of the file.
This is repeated to ``bubble'' the next largest item up to the
n - 2 position, and then the next largest, etc.
The bubbleSort() algorithm below performs these steps,
but first let's consider an example.
The horizontal line indicates where comparisons stop.
Items highlighted in bold font have bubbled up during a pass.
Note the sample array is indexed 0 through 15 to hold the 16 values.
The brute force algorithm executes 15 passes in total, but we could after 9 passes
since no exchanges occur after that.
Counting the number of record comparisons is easy using sums.
Letting n=record.length, we know that
We can also count the number of record exchanges in bubble sorting.
In the worst case, a swap is made for every comparison;
this occurs when the file is in reverse order
The average case analysis of record exchanges is interesting.
We need the probability that the if test evaluates to true.
We assume that the records are initially in random order and
that any of the n! possible permutations of these records can
occur with equal probability.
On the first pass of the outer for loop when i=record.length,
the if test will be true for
At each stage we will be comparing the maximum of
The first pass of the outer for loop alters the initial random
distribution of the keys and we must account for it.
Now, on the second pass, when i=record.length-1,
the if test will be true for
For i = n - 1, a swap will occur for j if the first, second, ...,
jth smallest key is in the j + 2st location.
Each of these cases occur with probability 1/(j + 2) and are mutually
exclusive, so we find
In general, the probability that the if test evaluates to true
on the (n - i + 1)th pass is
And the average number of swaps is
A refinement of bubble sort is to reverse direction on each pass of bubble
sort. This leads to a slight improvement on bubble sort but not to an
extent that it becomes better than straight insertion sort.
As a last general sorting methodology we will
study selection sorts that select the smallest (or largest)
key, output them, and then repeat.
In particular, we will study straight selection sort,
tree sort, and heapsort.
Find the smallest key and transfer it to output (or the first location
in the file). Repeat this step with the next smallest key, and continue.
Notice after i steps, the records from location 0 to i - 1 will be sorted.
In the example, the current smallest value is highlighted in bold font
as the values are scanned from left to right.
In all cases, the number of comparisons made by straight selection sort is
Tree selection compares keys two at a time raising the smaller
up a level in a binary tree. The record with the smallest key
eventually moves to the root where it is output.
This is similar to a tournament where players rise to the top
of the bracket as they continue to win.
Keys are compared in pairs and the larger (smaller) of each pair promoted.
These promoted keys are compared in pairs and again the larger (smaller) is
promoted. This continues until the largest (smallest) key is found.
Here's the tournament tree for our sample data.
Once the largest item is removed at most
It follows we need space of order
S(n) = O(n) for storing the output and pointers
to the original leaf of the keys.
And it follows that tree selection has running time
The importance of tree sort is that it generalizes to an important algorithm:
heapsort.
Heapsort was invented by J. W. J. Williams in 1964
and Robin Floyd suggested several efficient implements in the same year.
Heaps can be used for priority queues,
a data structure where the largest, i.e., highest priority
item is always first.
Priority queues need not be completely sorted, but
it should be easy (efficient) to support the following operations on a
priority queue:
The heap data structure for implementing priority queues is
a left-complete binary tree with the heap property.
That is, a heap is
a binary tree which is completely filled at all levels
except possibly the last, which is filled from left to right
and
the key in each node is larger than (or equal to) the keys of its children.
A complete binary tree of height h has 2h - 1 internal nodes and
2h external (leaf nodes), or a total of
n = 2h + 1 - 1 nodes.
The number of leaf nodes in a left-complete binary tree of height h
lies between 2h - 1 and 2h, and the number of internal nodes
lies between 2h - 1 and 2h - 1.
The total number of nodes in a left-complete binary trees lies between
When the left and right subtrees of a left-complete binary tree
are both complete they each contain 2h - 1 nodes.
When the left subtree is complete to height h - 1, but the right
subtree is complete only to height h - 2, the left subtree contains
2h - 1 nodes and the right contains 2h - 1 - 1 nodes.
That is, the total number of nodes is
A heap can be stored in an array, indexed from 1,
where node j has left child in position 2j and
and right child in position 2j + 1.
The parent of node j is in position
Let's pretend we are given an array to heapify, say
Given an array A and an position k, where the binary tree
rooted at 2k and 2k + 1 are heaps,
we make the tree rooted at node k a heap with the
heapify() algorithm.
The idea is to exchange (if necessary) the element A[k] with the largest
of A[2k] and A[2k + 1], and then if an exchange occurred,
heapify() the changed left or right subtree.
Clearly, the time to fix the relationship among
A[k], A[2k], A[2k + 1] is
Let's Pretend the tree at node k has n nodes.
The children's subtree can have size at most 2n/3,
which occurs when the last row of the tree is half full.
Thus, the running time of heapify() is given by the recurrence
Two codes to heapify a file of records starting from index k follow.
The first one is from Sedgewick [4];
the second from Corman et al [1].
(Note the arrays are indexed from 0 to n but only index
1 to n are used to store data.)
Since the elements from position
A more careful analysis shows we can build a heap in linear time (
The steps of heapSort() are:
Here's how the algorithm works on our example set of keys.
First we build a heap from the original array.
Next, exchange the root and last element and heapify from the root
down, excluding the last element.
And repeat:
One more time:
Building a heap is O(n).
Swapping the root and the last element and decrementing heap size
are
Comparison sorts determine the order of elements based only on
comparisons between the input keys.
Examples of comparison sorts are insertion sort, merge sort, selection sort and
quicksort.
Sequential comparison sorts can be viewed in terms of a decision tree.
Since there are n! permutations of the items,
a decision tree must have n! leafs (this assumes no redundant comparisons,
but since we're interested in a lower bound on comparisons, that's okay).
If the height of a decision tree is h then as many as C(n) = h comparisons
are needed to sort some permutation of the keys.
A binary tree of height h has no more than 2h leafs, so we have
Some natural questions to ask are:
When operations other than comparisons can be used to sort keys we may be
able to sort using fewer than
We will now explore several linear time sorts.
In particular, we will consider counting sort,
radix sort, and bucket (or bin) sort.
Let's first looking at a sort based on counting that is not linear.
It is called comparison counting, which from the
above section implies that its running time is at least
nlg n,
and it will lead to a more efficient sort algorithm called
distribution counting.
The basic idea in comparison counting is to use an auxiliary array
C[] that holds the count of the number of keys less than a given key.
For example, C[0] will tell how many keys are less than
record[0].key, which implies that in the sorted file record[0]
is in position C[0] + 1.
For the example sequence, we start with all counts initialized to 0.
On the first pass, all keys bigger than the last have their counts
incremented and the last has its incremented for all keys smaller than it.
On the second pass, all keys (except the last)
bigger than the next to last have their counts
incremented and the next to last has its incremented for all keys smaller than it.
This continues until we compare the key in position 1 with the key in
position 0, incrementing the position 0 count by 1.
Note that comparison counting is a kind of address table sort;
that is, the C array infers the position of each element in the sorted
list, but no records are actually moved.
It is clear that the time complexity of the comparison counting algorithm is
Distribution counting sort assumes that each of the n input
elements is an integer in some range, say from u to v for some
u < v. For simplicity, we'll assume u = 0 and v = m - 1.
The steps of the algorithms are:
In the example, the range is from u = 000 to v = 999.
Only count array elements corresponding to keys are shown.
Initialize the count array C to zero.
In one pass over the file count the number of times each key occurs.
In one pass over the range count the number of keys less than or equal others
Now move the last record to the 13th position in a new output file, and decrement
the count of key 703 by 1.
Then move the next to last record to output position 14 and decrement its count.
And continue:
Distribution counting sort is stable: numbers with the same value appear in
the output array in the same order as they were in the input array.
When
v - u = m - 1 = O(n), the distribution counting sort runs in
Radix sort was used by card-sorting machines, which if you may never
have seen unless you are an old timer.
Herman Hollerith was 20 when he build his original tabulating and sorting
machine for the 1890 U. S. census. His machine used the basic idea
for radix sorting.
Radix sorting is exactly opposite to merging.
We assume keys are represented by d-tuples
Suppose we want to sort a 52-card deck of playing cards. We define an order
on face values:
This card sorting technique is a least significant digit radix sort.
It also works for sorting integers and words. Here's how our sample data
is sorted.
First we count the number of 0's, 1's, 2's,..., 9's in the units
(least significant) digit of the data and accumulate the space needed
to restore the as in distribution counting. Then we count on the tens
digit and restore the data. Finally, we count on the hundreds digit
and restore the data.
If we assume distribution counting sort is used as the stable sorting
algorithm on each digit, then the running time of radix sort is
Bucket sort runs in linear time on average.
To achieve this average case behavior, we assume the keys are uniformly
distributed over some range
The idea is to divide the range into (about) n equal sized subranges
(or buckets or bins).
Then, in one pass through the file, place each record in the bucket
to which its key belongs.
For our sample data, let's create 10 buckets corresponding to
Except for the call to insertionSort(), the complexity of bucketSort()
is O(n) in the worst case.
Under the assumption that the data is uniformly distributed, the
probability that a given record falls in bucket i is p = 1/n.
Let ni be a random variable denoting the number of elements in bucket i.
The probability that ni = j is given by a binomial distribution.
That is, the for ni to equal j, j of n records must have fallen
in bucket i and n - j must have fallen in other buckets.
The probability of this occurring is given by
The time complexity of insertion sort on ni keys is O(ni2), and so
using summation notation to determine the running time
of the for loops that executes the insertion sorts, we find
There's still a tremendous amount of knowledge about sorting to be covered.
We've simply brushed the surface.
There is no best sorting method.
You should be able to use the information gleaned here to begin to
have positive ideas about which algorithm to choose for a given application.
Below is a table of summing up basic facts about the internal sorting algorithms
we have studied. The space and running times are given as orders of growth.
The table above does not provide timing differences between algorithms which
have the same order of growth.
Based on estimates given in Knuth's Sorting and Searching [3],
we can give the following advice on the average running time of the algorithms.
But first, let's be clear about terms.
To say algorithm A is m% faster than algorithm B we mean
Sorting by Exchange
Bubblesort
pass 1
pass 2
pass 3
![]()
908
908
908
765
703
897
897
677
765
703
765
612
677
765
703
509
612
677
677
154
509
612
653
426
154
509
612
653
426
154
509
275
653
426
154
897
275
653
426
170
897
275
512
908
170
512
275
061
512
170
503
512
061
503
170
087
503
061
87
503
087
087
61
15 compares
14 compares
[[Bubble sort]]=
public void bubbleSort (Record[] record) {
for (int i = record.length; i > 1; i--) {
for (int j = 1; j < i; j++) {
if (record[j-1].key > record[j].key) {
record.swap(j-1, j);
}
}
}
}
Analysis of bubble sort
1 = i - 1
i - 1 =
.
=
(n2).
= O(n2).
(1).
and so on.
, j = 1,..., n - 1.
, j = 1,..., n - 1.
and so on.
, j = 1,..., n - 2.
Pswap(i = n - k, j)
=
, j = 1,..., n - k - 1
=
, j = 1,..., i - 1
![]()
![]()
![]()
+
![]()
+
![]()
+
...
+
![]()
![]()
+
![]()
+
...
+
![]()
+
![]()
+
...
+
![]()
![]()
+
![]()
=
+
(1 + 2) +
(1 + 2 + 3) + ...
(
)
=
(1 + 2 + ... + (n - 1))
=

Cocktail shaker sort
Sorting by selection
Straight selection sort
[[Straight selection sort]]=
public void selectionSort(Record[] record) {
for (int i = 0; i < record.length; i++) {
int min = i;
for (int j = i+1; j < record.length; j++) {
if (record[j].key < record[min].key) min = j;
}
Swap (record[min], record[i]);
}
}
Analysis of straight selection sort
![]()
1=
(n - i - 1)
=

=
(n2).
Straight selection sort always makes
n - 1 =
(n) swaps.
Since there are always a linear number of swaps, selection sort may be the
best method when the records are large and expensive to move.
Tree selection
lgn
comparisons
are needed to pick the next largest number and fix-up the tree.
That is, we need only follow one path from the leaf where the root came
from back to the root of the tree.
(nlg n).
+
+
+ ... + 1=
n
![]()
+
+
+ ... +
![]()
=
n
![]()
- 1![]()
=
n - 1
are needed to create the tree (it should be clear that this is the number of
games needed to select the best of n players).
Heapsort
n
2h + 1 - 1.
n =
(3 . 2h - 1 - 2) = 2h - 4/3
j/2
.
For example
position
1
2
3
4
5
6
7
8
9
value
15
12
14
11
6
7
8
9
3
Heapifying an array
position
1
2
3
4
5
6
7
8
9
value
3
9
7
8
11
12
6
15
14
n/2
=
9/2
= 4,
and, if necessary, exchange the larger value from its two children with
its value.
![]()
position
1
2
3
4
5
6
7
8
9
value
3
9
7
15
11
12
6
8
14
position
1
2
3
4
5
6
7
8
9
value
3
9
12
15
11
7
6
8
14
![]()
![]()
position
1
2
3
4
5
6
7
8
9
value
3
9
12
15
11
7
6
8
14
position
1
2
3
4
5
6
7
8
9
value
3
15
12
9
11
7
6
8
14
position
1
2
3
4
5
6
7
8
9
value
3
15
12
14
11
7
6
8
9
![]()
![]()
position
1
2
3
4
5
6
7
8
9
value
15
3
12
14
11
7
6
8
9
position
1
2
3
4
5
6
7
8
9
value
15
14
12
3
11
7
6
8
9
position
1
2
3
4
5
6
7
8
9
value
15
14
12
9
11
7
6
8
3
(1).
(1)
(lg n)
(nklog3/2n) =
(lg n).)
[[Heapify an array]]=
public void heapify (Record[] record, int k) {
Record r = record[k];
int key = record[k].key;
while (k <= record.length/2) {
j = 2*k;
if (j < n && record[j].key < record[j+1].key) ++j;
if (key >= record[j].key) break;
record[k] = record[j];
k = j;
}
record[k] = r;
}
[[Heapify an array]]=
public void heapify (Record[] record, int k) {
int largest = k;
int left = 2*k;
int right = 2*k+1;
if (left <= record.length-1 && record[left].key > record[k].key) {
largest = left;
}
if (right <= record.length-1 && record[right].key > record[largest].key) {
largest = right;
}
if (largest != k) {
record.swap(k, largest);
heapify(record, largest);
}
}
n/2
+ 1], ..., n - 1have no children, they are each, trivially, one element heaps.
We can build a heap by running heapify() on the remaining nodes.
Each call will costs at most
(lg n) operations and there will be
(n) calls.
Therefore, constructing a heap is at most
(nlg n).
(n)).
In particular, suppose the tree is complete, of height h, and has
n = 2h + 1 - 1 nodes. Then we have:
T(n)
=
lg(n) + 2lg(n/2) + 4lg(n/4) + ... + 2hlg(n/2h)
=
lg(n) + lg((n/2)2) + lg((n/4)4) + ... + lg((n/2h)2h)
=
lg(n(n/2)2(n/4)4 ... (n/2h)2h)
=
lg(n1 + 2 + 4 + ... + 2h/22 + 8 + ... + h2h)
=
lg(n2h + 1 - 1/22 - (h + 1)2h + 1 + h2h + 2)
=
(2h + 1 - 1)lgn - (2 - (h + 1)2h + 1 + h2h + 2)
=
nlg n - (2 - (h + 1)(n + 1) + 2h(n + 1))
=
nlg n - (2 + (h - 1)(n + 1))
=
nlg n - (2 + (h + 1 - 2)(n + 1))
=
nlgn - (2 + (lgn - 2 -
)(n + 1))
=
nlgn - (2 + nlgn + lgn - (2 +
)(n + 1))
=
(2 +
)(n + 1) - 2n - lg n
=
(n)
[Build a heap]=
buildHeap(Record[] record) {
for (int k = Math.floor (record.length/2); i > 0; i--) {
heapify(record, k);
}
}
Finally, heapsort and it's analysis
![]()
![]()
![]()
![]()
(1).
Each time we heapify from the root of the tree it will take time
[c + O(lgk)] =
(nlg n)
[[Heapsort]]=
public void heapSort(Record[] record) {
int n = record.length;
buildHeap(record);
for (int k = n-1; k > 1; k--) {
record.swap(1, k);
--record.length;
heapify(record, 1);
}
record.length = n;
}
Lower bound on time complexity of comparison sorts
(nlg n).
Thus, the number of comparisons C(n) in a sequential comparison sort is asymptotically bounded
below by nlg n.
2h = 2C(n)
lgn!
h = C(n).
![]()
![]()
![]()
![]()
![]()
1 +
+
-
+ O![]()
lgn!
= nlgn - n/(ln2) +
lg n + O(1).
I believe no one yet knows the answers to these questions.
(nlg n) operations.
There are Some sorting techniques use special properties of the input data to sort
faster than
(nlg n).
Sorting by Distribution
Counting sorts
Keys
503
087
512
061
908
170
897
275
653
426
154
509
612
677
765
703
C (init.)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
C, i = 15
0
0
0
0
1
0
1
0
0
0
0
0
0
0
1
12
C, i = 14
0
0
0
0
2
0
2
0
0
0
0
0
0
0
13
12
C, i = 13
0
0
0
0
3
0
3
0
0
0
0
0
0
11
13
12
C, i = 12
0
0
0
0
4
0
4
0
1
0
0
0
9
11
13
12
C, i = 10
0
0
1
0
5
0
5
0
2
0
0
7
9
11
13
12
C, i = 10
1
0
2
0
6
1
6
1
3
1
2
7
9
11
13
12
... ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
C, i = 2
5
1
8
0
15
3
14
4
10
5
2
7
9
11
13
12
C, i = 1
6
1
8
0
15
3
14
4
10
5
2
7
9
11
13
12
[Comparison counting]=
public void comparisonCount(Record[] record) {
int[] count = new int[record.length];
for (int i = 0; i < record.length; i++) count[i] = 0;
for (int i = record.length - 1; i > 0; i--) {
for (int j = i-1; j >= 0; j--) {
if (record[i].key < record[j].key) {
++count[j];
}
else {
++count[i];
}
}
}
}
T(n)
=
1 + ![]()
1
=
n +
i
=
n + n(n - 1)/2
=
(n2).
The space complexity of the sort is
(n).
[Distribution counting]=
public Record[] countingSort(Record[] record, int m) {
int[] count = new int[m];
Record[] newRecord = new Record[record.length];
for (int i = 0; i < m; i++) { // clear the counts to zero
count[i] = 0;
}
for (int j = 1; j < record.length; j++) { // increment count of each key
++count[record[j].key];
}
// count[i] now holds the number of i's in the file
for (int i = 1; i < m; i++) {
count[i] += count[i-1];
}
// count[i] now holds the number keys less than or equal to i
for (int j = record.length-1; j >= 0; j--) {
newRecord[count[record[j].key]] = record[j];
--count[record[j].key];
}
return newRecord;
}
Keys
503
087
512
061
908
170
897
275
653
426
154
509
612
677
765
703
C
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Keys
503
087
512
061
908
170
897
275
653
426
154
509
612
677
765
703
C
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Keys
503
087
512
061
908
170
897
275
653
426
154
509
612
677
765
703
C, j = 87
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
C j = 154
1
2
1
1
1
1
1
1
1
1
3
1
1
1
1
1
C j = 170
1
2
1
1
1
4
1
1
1
1
3
1
1
1
1
1
... ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
C j = 908
7
2
9
1
16
4
15
5
11
6
3
8
10
12
14
13
And so on.
Keys
503
087
512
061
908
170
897
275
653
426
154
509
612
677
765
703
Output
703
C
7
2
9
1
16
4
15
5
11
6
3
8
10
12
14
12
Output
703
765
C
7
2
9
1
16
4
15
5
11
6
3
8
10
12
13
12
Output
677
703
765
C
7
2
9
1
16
4
15
5
11
6
3
8
10
11
13
12
Output
612
677
703
765
C
7
2
9
1
16
4
15
5
11
6
3
8
9
11
13
12
Output
509
612
677
703
765
C
7
2
9
1
16
4
15
5
11
6
3
7
9
11
13
12
(n + m)
(n + m).
Radix Sort
<
<
<
.
, A) < (
, 2) < ... (
, K) < (
, A) < ... < (
, Q) < (
, K).
Keys
503
087
512
061
908
170
897
275
653
426
154
509
612
677
765
703
Units count
1
1
2
3
1
2
1
3
1
1
Storage needed
1
2
4
7
8
10
11
14
15
16
Restored keys
170
061
512
612
503
653
703
154
275
765
426
087
897
677
908
509
Tens count
4
2
1
0
0
2
3
3
1
1
Storage needed
4
6
7
7
7
9
11
14
15
16
Restored keys
503
703
908
509
512
612
426
653
154
061
765
170
275
677
087
897
hundreds count
2
2
1
0
1
3
3
2
1
1
Storage needed
2
4
5
5
6
9
12
14
15
16
Restored keys
061
087
154
170
275
426
503
509
512
612
653
677
703
765
897
908
[[Radix sort]]=
public void radixSort(Record[] record, int d) {
for (int i = 0; i < d; i++) {
(d (n + m))
Bucket (Bin) Sort
key/100
.
[[Bucket sort]]=
public void bucketSort(Record[] record) {
Record r = new Record[record.length];
for (int i = 0; i < record.length; i++) {
Analysis of bucket sort
The expected value of a random variable ni fitting a binomial
distribution is given by
E[ni]
=
jn
![]()

![]()

![]()
![]()
![]()
![]()
![]()
![]()
It can also be shown that the variance of a random variable fitting a binomial
distribution is
O(E[ni2])=
O
![]()
E[ni2]![]()
=
O
![]()
V[ni] + E2[ni]![]()
=
O
![]()
np(1 - p) + (np)2![]()
=
O
![]()
1 -
+ 1![]()
=
2n - 1
Thus, the expected (average) time for bucket sort is
Summing up sorting
Running times
Sort method
Stable
Space
Average
Worst
Bubblesort
Yes
1
n2
n2
Bucket sort
Yes
n
n
n
Comparison counting
Yes
n
n2
n2
Distribution counting
Yes
n + m
n + m
n + m
Heapsort
No
1
nlg n
nlg n
Mergesort
Yes
n
nlg n
nlg n
Quicksort
No
lg n
nlg n
n2
Shell's sort
No
1
n1.25
n1.5
Straight insertion
Yes
1
n2
n2
Straight selection
Yes
1
n2
n2
Radix sort
Yes
1
d (n + m)
d (n + m)
The range of objects in distribution counting and radix sort has m items;
there are d digits in the lexicographic order for radix sort.
= 1 +
.
Problems
Problem 1:
Consider the data set 2, 5, 7.
For all possible orders of this set determine the number of compares
straight insertion sort would make. Verify that the minimum number
of compares is n - 2 = 2, the maximum number of compares is
- 1 = 5, and the average number of compares is
-
= 7/2.
Problem 2:
Design an algorithm for binary insertion and analyze its complexity.
Problem 3:
For one or more diminishing sequences of increments,
test Shell's sort empirically and find curves that fit its running time well.
Problem 4:
An improved bubble sort keeps track of whether or not a swap is made
on each pass of the file. When no swaps are made the file is sorted
and the algorithm can be terminated.
Design an algorithm that implements this improvement.
What is the running time of this improved bubble sort algorithm?
Problem 5:
Consider the data set 2, 5, 7.
For all possible orders of this set determine the number of swaps
bubble sort would make. Verify that the minimum number
of swaps is 0, the maximum number of swaps is
= 3, and the average number of compares is
= 3/2.
Problem 6:
Design an algorithm that implements the cocktail shaker idea.
Problem 7:
Solve the recurrence
T(n) = T(2n/3) + 1, T(1) = 1 exactly.
Problem 8:
Here's a problem from [1].
Professor's Howard, Fine, and Howard have proposed the following ``elegant''
sorting algorithm:
StoogeSort(char[] A, int i, int j) {
if (A[i] > A[j])
swap (A[i], A[j]);
if (i + 1 >= j)
return;
k = floor((j - i + 1)/3);
StoogeSort(A, i, j - k); /* First two-thirds */
StoogeSort(A, i + k, j); /* Last two-thirds */
StoogeSort(A, i, j - k); /* First two-thirds again */
}
StoogeSort.
bound on the
worst-case running time.
Problem 9:
Illustrate the operation of distribution counting sort on the data
Problem 10:
What is the reason for decrementing the count in distribution counting sort
whenever a record is moved to output?
Problem 11:
Illustrate the operation of radix sort on the data
Problem 12:
Provide arguments that the claims about stability of the sorts mentions in
section #_a#>
Problem 13:
Verify the ``percentage faster estimates'' given above by experiments.
I'd like to know how accurate they are.
Bibliography
William Shoaff
2000-10-16