Computerscience analysis of
sorting algorithms
show that, on average, no generic sorting algorithm can scale faster
than O(nlogn) (see the sidebar Orders of Magnitude). However, many applications
don't require a
"general" sort. You often have
additional information that can help you to improve the speed of a
particular sort.
When
discussing the time taken for particular algorithms to execute, it is
important to know not just how long the algorithm takes for a
particular dataset, but also how long it takes for differentsized
datasets, i.e., how it scales. For applications, the problems of
handling 10 objects and handling 10 million objects are often
completely different problems, not just differentsized versions of
the same problem.
One common way to indicate the behavior of algorithms across
different scales of datasets is to describe the
algorithm's scaling characteristics by the dominant
numerical function relating to the scaling behavior. The notation
used is
"O(function),"
where function is replaced by the dominant
numerical scaling function. It is common to use the letter
"n" to indicate the number of data
items being considered in the function. For example, O(n) indicates
that the algorithm under consideration increases in time linearly
with the size of the dataset. O(n^{2})
indicates that the time taken increases according to the square of
the size of the dataset.
These orders of magnitude do not indicate absolute times taken by the
algorithm. Instead, they indicate how much longer the algorithm takes
when the dataset increases in size. If an O(n) algorithm takes 200
seconds for n=10, it will take about 2000 seconds for n=100, i.e., a
tenfold increase in the dataset size implies a tenfold increase in
the amount of time taken by the algorithm. An
O(n^{2}) algorithm might take 5 seconds for
n=10, and about 500 seconds for n=100, i.e., a tenfold increase in
the dataset size implies a hundredfold increase in the time taken by
the algorithm. Note that the scaled times are approximate;
orderofmagnitude statistics include only the dominant scaling
function, and there may be other, smaller terms that adjust the
actual time taken.
The order of magnitude does not indicate the relative speeds of two
different algorithms for any specific dataset size. Instead,
orderofmagnitude statistics indicate how expensive one particular
algorithm may be as your dataset grows. In the examples of the last
paragraph, the time taken for the second
O(n^{2}) algorithm increases much faster than
the first O(n) algorithm, but the O(n^{2})
algorithm is still faster at n=100. However, by n=1000, it would be
the slower of the two algorithms (50,000 seconds, compared to 20,000
seconds for the O(n) algorithm).
To take a
concrete example, hash tables have an O(1) order of magnitude for
accessing elements. This means that the time taken to access elements
in a hash table is mostly independent of the size of the hash table.
Accessing elements in an
array by linearly searching through that array takes O(n). In
absolute times, it might be quicker to execute the linear array
search on a small array than to access from a hash table. But as the
number of elements becomes larger, at some point the hash table will
always become quicker.

For example, if you have 1000 items to sort and each item can be
given a unique ordering value that corresponds to a unique integer
between 1 and 1000, the best sort is simply to slot the items
directly into their correct locations in an array. No comparisons
between the items are necessary.
Of course, typically you can't map your elements so
neatly. But if you can map items to integer keys that are more or
less evenly distributed, you can still take advantage of improved
sorting characteristics. Bear in mind that an array of partially
sorted items can be sorted faster than a typical unsorted array.
When you can guess the approximate final position of the items in the
collection to be sorted, you can use this knowledge to improve
sorting speed. You should specifically look out for sorts where:
Items can be given an ordering value that can be mapped to integer
keys.
The distribution of the keys is regular, or any one of the following
is true:
The distribution of the keys is fairly even, so that when mapped into
array indexes, ordering is approximately kept.
The keys have evenly distributed clusters.
The distribution of the keys has a mapping into one of these other
distributions.
The distribution of the keys is fairly critical. A regular
distribution allows them to be mapped straightforwardly into array
indexes. An uneven distribution is difficult to map. But if you have
an uneven distribution and can specify a mapping that allows you to
flatten out the keys in some way, it may still be possible to apply
this methodology. For example, if you know that your keys will have a
normal (bellcurve) distribution, you can apply an inverse bellcurve
function to the keys to flatten them out to array indexes.
For this technique to work, the mapped keys do not need to be unique.
Several keys or groups of keys can map to the same value or values.
Indeed, it is quite difficult to make the index mapping unique. You
need to be aware of this and handle the resulting collisions.
Normally, you can map clusters of keys into subsections of the sorted
array. These subsections are probably not internally sorted, but they
may be correctly sorted against each other (i.e., all elements in
subsection 1 are ordered below all elements in subsection 2, all
elements in subsection 2 are ordered below all elements in subsection
3, etc.). This way, the problem has been modified to sort multiple
smaller subsections (which is faster than sorting the whole array),
and hence the array is sorted more quickly.
Note that
Object.hashCode( )
provides a mechanism for generating an integer for any object.
However, the resulting hash code is not guaranteed to be evenly
distributed or even unique, nor is it at all guaranteed to be
consistent across different VMs or even over multiple runs of one VM.
Consequently, the hash code is of little use for any kind of mapping.
KarlDietrich Neubert gives a
detailed implementation of this approach, where the algorithm
provides O(n) sorting behavior and also minimizes the extra memory
needed to manage the sort.

I also implemented Neubert's sort for a plain
int array rather than for an array of objects. The
results were the same as for the object array when the
JIT was turned off. But
with any type of JIT turned on, the two simpler referencesort
algorithms were optimized much better by the native code compiler and
were faster for all sizes of arrays I tested (up to several million
elements). Their absolute sort times were sufficiently fast that
their bad scaling behavior didn't matter. This
curious difference in relative speeds applied only to sorting
int[ ] arrays, not arrays of objects. For arrays
of objects, Neubert's sort seems to be faster both
with and without a JIT.


I include here a Java implementation of Neubert's
algorithm with comments in the code. I have applied the
implementation to an array of objects that have an integer field to
specify the sort order, but of course the algorithm can be easily
generalized to other cases where object ordering can be mapped to
numbers. For a more detailed discussion of the algorithm, refer to
Neubert's article. The implementation given here
sorts significantly faster than either the sort in
java.util.Arrays (in Java 2) or a handcrafted
quicksort (the most optimized final version with no casts from the
first section of this chapter); see Table 93. Note also that this
sort is O(n) and thus increases linearly in time, whereas the other
sorts are O(nlogn) and so have a superlinear speedup. Notice how the
interpreted mode Flashsort time approaches the timings of the
compiled JVMs and is actually faster than the pure JIT of 1.2.2
running Arrays.sort( ). That's a
significant speedup.
Table 93. Timings of the various sorting tests normalized to the Flashsort JDK 1.2 test
Neubert's Flashsort

100%

94%

155%

74%

79%

294%

Handcrafted quicksort

250%

178%

130%

149%

132%

921%

Arrays.sort( )

476%

181%

183%

174%

164%

906%

Note that the sort at the end of the Neubert algorithm is an
insertion
sort running over the entire array. Insertion sorts provide better
performance than quicksorts for partially ordered arrays. This final
insertion sort ensures that keys incorrectly classified by the group
distribution end up in the right location:
public interface FlashSortable{
public int sortOrder( );
}
public static void flashsort(FlashSortable[ ] arr)
{
//Number of groups into which the elements are classified
//Neubert suggests 0.2 to 0.5 times the number of elements in the array.
int num_groups = (int) (0.4 * arr.length);
//Count the number of elements in each group
int[ ] groups = new int[num_groups];
flashsort(arr, num_groups, groups);
}
public static void flashsort(FlashSortable[ ] arr, int num_groups, int[ ] groups)
{
//First get the minimum and maximum values
int min = arr[0].sortOrder( );
int max_idx = 0;
int i;
for (i = arr.length1; i > 0; i)
{
if (arr[i].sortOrder( ) < min)
min = arr[i].sortOrder( );
if (arr[i].sortOrder( ) > arr[max_idx].sortOrder( ))
max_idx = i;
}
//If they are the same, all elements are identical
//so the array is already sorted.
if (min = = arr[max_idx].sortOrder( ))
return;
//Count the number of elements in each group.
//Take care to handle possible integer overflow by
//casting to larger datatypes where this might occur.
double scaling_constant = (num_groups  1) /
( ((double) arr[max_idx].sortOrder( ))  min);
int group;
for (i = arr.length1; i >= 0; i)
{
group = (int) (scaling_constant * (((long) arr[i].sortOrder( ))  min));
groups[group]++;
}
//Set the groups to point to the indexes in the array
//that are the last index for each group.
groups[0];
for (i = 1; i < groups.length; i++)
{
groups[i] += groups[i1];
}
//Put the biggest element at index 0 so that the swapping
//algorithm below starts on the largest element & max group.
FlashSortable old_value = arr[max_idx];
arr[max_idx] = arr[0];
arr[0] = old_value;
//start with element at index 0
int idx = 0;
//and the maximum group
group = num_groups  1;
//Start moving elements into their groups.
//We need to make 'arr.length' moves at most,
//but if we have one move left in the outer loop
//then the remaining element is already in the right place,
//so we need test for only 'arr.length1' moves.
int number_of_moves_left = arr.length  1;
FlashSortable new_value;
while(number_of_moves_left > 0)
{
//When the first group fills up, we start scanning
//for elements left in the wrong groups, and move them.
//Note that we scan through the whole object array only once.
while(idx > groups[group])
{
idx++;
group = (int) (scaling_constant * (((long) arr[idx].sortOrder( ))  min));
}
new_value = arr[idx];
//We run this loop until the first group fills up.
//Then we run the previous scan loop to get back into this loop.
while( idx != (groups[group]+1) )
{
group = (int) (scaling_constant * (((long) new_value.sortOrder( ))  min));
old_value = arr[groups[group]];
arr[groups[group]] = new_value;
new_value = old_value;
groups[group]; //decrement the pointer to the next index
number_of_moves_left;
}
}
//Now we have our partially ordered array,
//we do an insertion sort to order the remainder.
for (i = arr.length  3; i >= 0; i)
{
if (arr[i+1].sortOrder( ) < arr[i].sortOrder( ))
{
old_value = arr[i];
idx = i;
while(arr[idx+1].sortOrder( ) < old_value.sortOrder( ))
{
arr[idx] = arr[idx+1];
idx++;
}
arr[idx] = old_value;
}
}
}