A Tale of Two Sorts
This issort of a long question... (I kill me!)
Consider the following, where I have written a small program that generates a list of random numbers a certain number of times, and compares the average run time of two sorts, the quick sort, which has a theoretical average time of O(n*log(n)) and the Radix sort, which has a theoretical average time of O(n*max), max being the number of digits in the highest number in the list.
publicclass Sorts{
staticfinalint NUMBERS = 1000;
staticfinalint MAXIMUM = 9;
staticfinalint MINIMUM = 0;
staticfinalboolean RUN_RADIX_SORT =true;
staticfinalboolean RUN_QUICK_SORT =true;
staticfinalint ITERATIONS = 20;
publicstaticlong start;
publicstaticlong stop;
private LinkedList[] buckets ={
new LinkedList<Integer>(),
new LinkedList<Integer>(),
new LinkedList<Integer>(),
new LinkedList<Integer>(),
new LinkedList<Integer>(),
new LinkedList<Integer>(),
new LinkedList<Integer>(),
new LinkedList<Integer>(),
new LinkedList<Integer>(),
new LinkedList<Integer>()
};
public ArrayList<Integer> radixSort(ArrayList<Integer> c){
int max = 0;
ArrayList<Integer> Z =new ArrayList<Integer>();
max = (int)Math.log10(MAXIMUM) + 1;
for (int i = 1; i <= max; i++){
for(int num : c){
int digit = (int)(num / Math.pow(10,i-1)) % 10;
buckets[digit].push(num);
}
for (int j = 0; j < 10; j++){
while (!buckets[j].isEmpty()){
Z.add((Integer)buckets[j].pop());
}
}
//Total: O(2*n) = O(n)
}
return Z;
}
publicvoid quickSort(List<Integer> c){
Collections.sort(c);
}
publicstaticvoid main(String[] args){
RadixSort rs =new RadixSort();
ArrayList<Integer> al =new ArrayList<Integer>();
Random r =new Random();
long radixTime = 0;
String s;
long quickTime = 0;
for(int z = 0; z < ITERATIONS; z++){
for (int i = 0; i < NUMBERS; i++){
float f = r.nextFloat();
int k = (int) Math.abs(Math.ceil((f * (MAXIMUM - 1)) + MINIMUM));
al.add(k);
}
start = System.nanoTime();
rs.radixSort(al);
stop = System.nanoTime();
radixTime += (stop - start);
ArrayList<Integer> quick =new ArrayList(al);
start = System.nanoTime();
rs.quickSort(quick);
stop = System.nanoTime();
quickTime += (stop - start);
}
System.out.println("Average time (Radix Sort): " + radixTime / ITERATIONS +" ns");
System.out.println("Average time (Quick Sort): " + quickTime / ITERATIONS +" ns");
}
}
The constants above will produce predicted behavio(u)r, However...
Consider a case where 100000 numbers are generated with a maximum of 900, a minimum of 0, and 2 iterations. In this case, log(n) = log(100000) = 5, so the quick sort has a predicted average time of O(n*5), whilst the radix sort has a predicted average of O(n*3) (max = 3 from 900).
The actual result, produced consistently is that the radix sort ran with horrible performance; about 5 times as long as the quick sort.
My question, for such a long-winded preface, is: why? What in my version of the radix sort is dominating that much time? I could even understand the radix sort taking the same amount of time or SLIGHTLY longer than the quick sort, but I found that it taking 5 times as long a bit strange.
[6654 byte] By [
jGardnera] at [2007-11-27 10:52:09]

> This is sort of a long question...
; )
Try using ArrayList's instead of those LinkedList's. I'm positive there will be a considerable gain in speed.
> > This is sort of a long question...
>
> ; )
>
> Try using ArrayList's instead of those LinkedList's.
> I'm positive there will be a considerable gain in
> speed.
Well my thinking behind the LinkedList was to use them as a queue; I never actually iterate all the way through the linked list, just look at the first one and pop it off; I had thought that, because you don't have to shift a LinkedList's item's keys all the way down the list when you add something at the front of the list, that it would be more efficient, but I will try!
If I change it to use an ArrayList and then remove from the front of the ArrayList, the performance becomes significantly worse (due to the key shifting, I think). Approximately 31 times as long as the quick sort. The order of the original list has to be maintained within the buckets.
Just for the heck of it, I had it remove from the end of the ArrayList (will not be sorted in the end, but I was curious.) Even in this instance, the radix sort took twice as long as the quick sort; still not quite satisfactory, given that on average, it should run in relatively significantly less time than the quick sort.
As another note, I am not sure how the collection used will affect it from the LinkedList; I thought that Queues had O(1) for push, peek, and poll?
Message was edited by:
jGardner
BTW, Collections.sort isn't a Quick Sort:
<quote>
The sorting algorithm is a modified mergesort (in which the merge is omitted if the highest element in the low sublist is less than the lowest element in the high sublist). This algorithm offers guaranteed n log(n) performance. This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.
</quote>
It was the best of times, it was the worst of times...
tsitha at 2007-7-29 11:35:49 >

> If I change it to use an ArrayList and then remove
> from the front of the ArrayList, the performance
> becomes significantly worse (due to the key shifting,
> I think). Approximately 31 times as long as the quick
> sort. The order of the original list has to be
> maintained within the buckets.
This is what I generally got when changing from (all!) LinkedList to ArrayList with your code:
Average time (Radix Sort): 3027773 ns
Average time (Quick Sort): 4967810 ns
Note that I did not look at your radix-sort implementation and your benchmark tests: especially the latter I find a bit confusing (could be just me!).
> > If I change it to use an ArrayList and then remove
> > from the front of the ArrayList, the performance
> > becomes significantly worse (due to the key
> shifting,
> > I think). Approximately 31 times as long as the
> quick
> > sort. The order of the original list has to be
> > maintained within the buckets.
>
> This is what I generally got when changing from
> (all!) LinkedList to ArrayList with your code:
> > Average time (Radix Sort): 3027773 ns
> Average time (Quick Sort): 4967810 ns
>
> Note that I did not look at your radix-sort
> implementation and your benchmark tests: especially
> the latter I find a bit confusing (could be just me!).
This is using the add method to add them and then remove(0) to remove from the front?
> ...
> This is using the add method to add them and then
> remove(0) to remove from the front?
No, since you did a pop() on the LinkedLists (removing the last added element), I also removed the last added element from the ArrayList. Removing the element at index 0 will cause the rest of the elements to shift (which is a costly operation).
Note that you are using the LinkedList's as stack, not as a queue.
My apologies; now I feel like an idiot for having wasted some of your time. Turns out I posted the wrong version of the radix sort (I had indeed tried using it as a stack when I realized that wouldn't work and changed it to a queue.) It DOES need to be a queue for it to work properly. Here is the updated radixSort method:
public ArrayList<Integer> radixSort(ArrayList<Integer> c) {
int max = 0;
ArrayList<Integer> Z = new ArrayList<Integer>(c);
max = (int)Math.log10(MAXIMUM) + 1;
for (int i = 1; i <= max; i++) {
Iterator<Integer> itr = Z.iterator();
while(itr.hasNext()) {
int num = itr.next();
int digit = (int)(num / Math.pow(10,i-1)) % 10;
buckets[digit].offerLast(num);
itr.remove();
}
for (int j = 0; j < 10; j++) {
while (!buckets[j].isEmpty()) {
Z.add((Integer)buckets[j].pop());
}
}
//Total: O(2*n) = O(n)
}
return Z;
}
What I said with the queue was correct; the code I originally posted was not. As I said (and you said, as wel), using an ArrayList as a queue would cause elements to shift, whether when you are adding at the beginning and removing from the end or adding at the end and removing from the beginning.
Also: I am using JDK6.0, in which LinkedList implements Deque; it does not have to traverse the entire list to add on to the end of the LinkedList so it is even stranger: both the offer AND the pop (should be poll technically for correct terminology) methods have O(1) time.
Message was edited by:
jGardner
Maybe it is just my computer; I can't for the life of me figure out why my implementation of a radix sort runs with such horrible performance compared to (I guess it is the merge sort) when it is EXPECTED to be FASTER.
> Maybe it is just my computer; I can't for the life of
> me figure out why my implementation of a radix sort
> runs with such horrible performance compared to (I
> guess it is the merge sort) when it is EXPECTED to be
> FASTER.
It is faster (if you use a small range MIN-MAX, which you did). Your implementation just isn't faster. This is mainly because you're using all sorts of operations on collections which take some time to complete. I bet that if you'd write it using only int's (and an int[][] as buckets) it'll gain quite some speed.
Anyway, here another implementation which is faster than Collections' sort(...) method (of course again only if the range of MIN-MAX is small enough!).
import java.util.*;
public class RadixSortDemo {
private List[] buckets;
private final int RUNS;
private final int RADIX;
public RadixSortDemo(int radix, int maxNum) {
RUNS = (int)Math.ceil(Math.log10(maxNum-1));
RADIX = radix;
buckets = new ArrayList[radix];
for(int i = 0; i < buckets.length; i++) {
buckets[i] = new ArrayList();
}
}
private List<Integer> generateNumbers(int n) {
int max = (int)Math.pow(RADIX, RUNS)-1;
List<Integer> numbers = new ArrayList<Integer>(n);
Random generator = new Random();
for(int i = 0; i < n; i++) {
numbers.add(generator.nextInt(max));
}
return numbers;
}
public List<Integer> radixSort(List<Integer> numbers) {
for(int r = 1; r <= RUNS; r++) {
int mod = (int)Math.pow(RADIX, r);
int div = (int)Math.pow(RADIX, r-1);
for(Integer num : numbers) {
int bucketNum = (num.intValue()%mod)/div;
buckets[bucketNum].add(num);
}
numbers = new ArrayList<Integer>();
for(int i = 0; i < buckets.length; i++) {
numbers.addAll(buckets[i]);
buckets[i] = new ArrayList<Integer>();
}
}
return numbers;
}
public static void main(String[] args) {
final int RADIX = 10, TESTS = 100, LIST_SIZE = 100000, MAX_NUM = 10;
long totalMerge = 0L, totalRadix = 0L, start, elapsed;
RadixSortDemo radixDemo = new RadixSortDemo(RADIX, MAX_NUM);
for(int i = 0; i < TESTS; i++) {
System.out.println("Test "+(i+1)+"/"+TESTS+"...");
List<Integer> numbersRadix = radixDemo.generateNumbers(LIST_SIZE);
List<Integer> numbersMerge = new ArrayList<Integer>(numbersRadix);
// Merge test
start = System.currentTimeMillis();
Collections.sort(numbersMerge);
elapsed = System.currentTimeMillis()-start;
totalMerge += elapsed;
// Radix test
start = System.currentTimeMillis();
numbersRadix = radixDemo.radixSort(numbersRadix);
elapsed = System.currentTimeMillis()-start;
totalRadix += elapsed;
}
System.out.println("Average Radix-sort: "+(totalRadix/TESTS)+" ms.");
System.out.println("Average Merge-sort: "+(totalMerge/TESTS)+" ms.");
}
}
