Strange Bublesort implementation flaw on wikisource

IMPORTANT! Please read it carefully in order to understand. Thanks for reading.

Hi all,

Today we explored the Bubble sort algorithm at school(Sixth Form in the UK, Y13) when we doing some Discrete Maths. So I thought I would go and implement it in Java(i've been doing Java for 4 days now). I firstly took a look at wikisource to see at the implementation and couldn't understand theirs so I went on and with the definition on a piece of paper I wrote mine. After some examination and tests I can conclude that my algorithm implementation is faster because I do the sorting with fewer passes - their implementation didn't take account whether any swaps occured in a pass so their number of passes was constant with respect to number of elements. So, here's the definition of Bubble sort according to our textbook:

1. If there's only one or less numbers in the list, then stop.

2. Make one pass down the list, comparing and swapping

3.If no swaps have occured then stop. Otherwise, ignore the last element

4. Repeat...

Now let's do the analysis.

I'll take the numbers 2, 1, 3, 5, 4. They only require one pass in order to be sorted but the disadvantage of Bubble sort as we all know is that we have to complete another pass through the data to ensure sorting is finished, so total number of passes is 2. Now, here's their implementation:

publicstaticvoid bSort(int data[],int n)// wikisource implementation

// pre: 0<=n <= data.length

// post: values in data[0..n-1] in ascending order

{

int numSorted=0;// number of values in order

int index;// general index

int temp;

while (numSorted < n)

{

// bubble a large element to higer array index

for (index=1; index < n-numSorted; index++)

{

if (data[index]<data[index-1])

{

// swap data[index] and data[index-1]

temp = data[index];

data[index]= data[index-1];

data[index-1] = temp;

}

}

// at least one more value in place

numSorted++;

System.out.println("pass done! their algorithm!");

}

}

You can see that their never check if any swaps have been done or not. Their algorithm does 5 passes in total. Here's my implementation:

publicstaticvoid BubbleSort(int[] data)

{

boolean Swaped =true;// set to true so while is executed once, too lazy to use do {} while()

if(data.length == 0 || data.length == 1)

{

System.out.println("List empty or no data to sort.");

}

else

{

int n = data.length;// n is the number of numbers to sort, so I have to do n - 1 comparisons

int temp;// temp var so I can swap vars

while(Swaped ==true)

{

Swaped =false;// could remove this if use do {} while()

for(int z = 0; z >< (n - 1); z++)

{

if(data[z+1] < data[z])

{

temp = data[z+1];

data[z+1] = data[z];

data[z] = temp;

Swaped =true;

}

}

n--;// decrement number of comparisons to do

System.out.println("pass done! my algorithm!");

}

}

My algorithm does only 2 passes(as expected). If you want to test yourself, here's the full source to play with:

/*

* Copyright (c) 2005 Milen Dzhumerov

* All rights reserved.

*/

/**

*

*/

package bubble;

/**

* @author gamehack

*

*/

publicclass Main{

/**

* @param args

*/

publicstaticvoid main(String[] args){

int[] test =newint[]{2, 1, 3, 5, 4};

int[] test1 =newint[]{2, 1, 3, 5, 4};

BubbleSort(test);

System.out.println();

bSort(test1, 5);

System.out.println();

int t = 0;

while(t < test.length)

{

System.out.println(test[t]);

t++;

}

System.out.println();

t = 0;

while(t < test1.length)

{

System.out.println(test1[t]);

t++;

}

System.exit(0);

}

publicstaticvoid BubbleSort(int[] data)

{

boolean Swaped =true;// set to true so while is executed once, too lazy to use do {} while()

if(data.length == 0 || data.length == 1)

{

System.out.println("List empty or no data to sort.");

}

else

{

int n = data.length;// n is the number of numbers to sort, so I have to do n - 1 comparisons

int temp;// temp var so I can swap vars

while(Swaped ==true)

{

Swaped =false;// could remove this if use do {} while()

for(int z = 0; z < (n - 1); z++)

{

if(data[z+1] < data[z])

{

temp = data[z+1];

data[z+1] = data[z];

data[z] = temp;

Swaped =true;

}

}

n--;// decrement number of comparisons to do

System.out.println("pass done! my algorithm!");

}

}

}

publicstaticvoid bSort(int data[],int n)// wikisource implementation

// pre: 0<=n <= data.length

// post: values in data[0..n-1] in ascending order

{

int numSorted=0;// number of values in order

int index;// general index

int temp;

while (numSorted < n)

{

// bubble a large element to higer array index

for (index=1; index < n-numSorted; index++)

{

if (data[index]<data[index-1])

{

// swap data[index] and data[index-1]

temp = data[index];

data[index]= data[index-1];

data[index-1] = temp;

}

}

// at least one more value in place

numSorted++;

System.out.println("pass done! their algorithm!");

}

}

}

I'm probably wrong anyway but I just wanted to share this with you and see what you will say about it.

Regards,

gamehack>

[10662 byte] By [gamehacka] at [2007-10-2 0:21:35]
# 1

Here's the sample output from the program:

pass done! my algorithm!

pass done! my algorithm!

pass done! their algorithm!

pass done! their algorithm!

pass done! their algorithm!

pass done! their algorithm!

pass done! their algorithm!

1

2

3

4

5

1

2

3

4

5

gamehacka at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 2
The two implementations are doing two different things. You are sorting the entire list. They are only sorting the first n elements. Theirs will probably be faster if you only need say the first 20 items in sorted order out of several hundred items. (Since you will sort the entire set.)
wdimaca at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 3

But then they will be sorting different amount of numbers which is not fair, is it? I'm talking about the same amount of numbers(a random list of numbers). I'm not saying that mine is better, I'm just saying that my implementation is the right one according to the spec, while their implementation doesn't take any account of the number of swaps per pass(what's the point in doing another pass if all the numbers are already sorted?). So I think that if you gave to random lists to be sorted, mine will be a bit faster cause I won't need to do unnecessary passes. I'm not sure that you did get my point.

Regards,

gamehack

gamehacka at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 4

You are correct that there is no need to continue sorting if you made a pass through the list where you swapped nothing.

Any decent bubble sort should include this optimization, therefore the one that you are quoting as a source is not decent.

That was a very good observation on your part if you have indeed spent only 4 days programming so far.

However, lest you think that by means of this discovery you can add you name to the annals of computing history, let me assure you that bubble sort is never actually used for sorting. It is an academic sorting algorthm has but one single feature to recommend it. That feature is that it is sufficiently easy to explain that even the poorest students of programming, can grasp the concept and can often even succeed in writing the code.

Bubble sort is not actually used for sorting, it is used for teaching.

Furthermore you can leave out several of the optimizations and still get it right. For example, your books description of bubble sort said nothing about reducing the length of the list that you sort by one on each successive pass through the list. This is possible becasue each pass does for sure get the last element into the right place. But the lazy bubble sort coder, just pounds through the whole list every time looking for that glorious moment when there were no swaps.

So you have one version of bubble sort that reduces the list size on each pass and uses that to determine the time to stop, that's what the wikisource code did, you have another version that watches for the "no swap" condition and uses that to determine time to stop, that's what your book version suggested, and then you have your version with does both.

Your method is better. Congratulations.

Unfortunately, your discovery that there is bogus source code out on the web is hardly earth shaking news. But keep hacking away at your studies, the more you know, the more bogosity you will see every day. It just keeps getting better and better.

Enjoy!

marlin314a at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 5

:) My purpose of writing here wasn't to go in the book of annals of computing history, I just wanted to share with you and see that I'm doing the right thing. About the 4 hours of programming, it's 4 hours of programming in Java but I really start to like it and will keep experimenting. And I'm perfectly aware that the bubble sort algorithm is not actually used anywhere in practice. Thank you for understanding my point.

Regards,

gamehack

gamehacka at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 6

If you want to measure excution time:

long t = System.currentTimeMillis();

// ... do sort

System.out.println(System.currentTimeMillis() - t);

The number of loops and the length of the code are not neccessarily proportional to the speed of the code. To determine the performance of one algorithm over the other, timed tests using a variety of input data must be perperformed.

rkippena at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 7

Sorting wasn't in D1/D2 when I did A-levels, so I don't know whether you also have big-O analysis in the syllabus. If you do, you'll discover that the extremely naive bubble sort at wikisource and your optimised version have the same asymptotic cost. That's all most academics care about. (Knuth is a glaring exception, but he takes concern for exact costs way too far).

YAT_Archivista at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 8
Yes, we will be having big-O analysis and we also saw that bubble sort is O(n^2).Regards,gamehack
gamehacka at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 9

Using this sequence of numbers

int[] test = new int[] { 2, 1, 4, 7, 2, 1, 6, 3, 5, 9 };

int[] test1 = new int[] { 2, 1, 4, 7, 2, 1, 6, 3, 5, 9 };

my implementation takes 0 miliseconds and 5 passes, while their 10 miliseconds and 10 passes.

Regards,

gamehack

gamehacka at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 10

> Your method is better. Congratulations.

For initial datasets that are nearly in order.

> Unfortunately, your discovery that there is bogus source code out on the web is hardly earth shaking news.

Define 'bogus' - for random data, the wiki code runs faster and produces the same result.

In a test which sorts a list of 5000 elements, times in ms, averaged over 20 runs:

inputgamehackwikituned

ascending order 01310

descending order336258237

random order284243224

The tuned version of gamehack's optimisation replaces the variable with a nested loop, and hits the array a little less:

public static void tunedSort(int[] data) {

n_loop:

for (int n = data.length; n > 0; n--) {

for (int z = 1, datum = data[0]; z < n; z++) {

int next = data[z];

if (next < datum) {

data[z - 1] = next;

data[z] = datum;

z++;

for (; z < n; z++) {

next = data[z];

if (next < datum) {

data[z - 1] = next;

data[z] = datum;

} else {

datum = next;

}

}

continue n_loop;

} else {

datum = next;

}

}

break;

}

}

But you're better off going to a lower-O algorithm than doing that sort of thing.

Don't assume that fewer iterations of a more complex algorithm are better. It's OK to 'waste' operations if checking whether it's wasteful costs more. Always do before and after profiling of potential optimisations with representative datasets as inputs.

Pete

pm_kirkhama at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 11

Yup,

My algorithm implementation is actually 3-4 times slower on random data and much faster on already ordered data. Here's an optimised version of my original one:

public static void BubbleSort2(int[] data)

{

boolean Swaped;

int n = data.length; // n is the number of numbers to sort, so I have to do n - 1 comparisons

do

{

Swaped = false; // could remove this if use do {} while()

for(int z = 0; z < (n - 1); z++)

{

if(data[z+1] < data[z])

{

//temp = data[z+1];

data[z+1] ^= data[z];

data[z] ^= data[z+1];

data[z+1] ^= data[z];

Swaped = true;

}

}

n--; // decrement number of comparisons to do

} while(Swaped == true);

}

But still it behaves like my first one. Is someone able to explain why I fail to achieve better speeds on random sorting?

Regards,

gamehack

gamehacka at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 12

On a random sort there are about n(n-1)/4 swaps and up to n(n-1)/2 comparisons, most of which are required.

In this example, it is array accesses and variable writes that are the cost you want to reduce - you won't get rid of most of the comparisons in a random sort.

Your swap flag prevents you from iterating if there are no swaps, so it removes the lower cost operations, but in doing so it adds an extra operation to every swap.

So the cost of testing whether you need to do something that doesn't cost much overwhelms the benifit of not doing it if you don't need to - you save doing a few tests at the cost of doing n(n-1)/4 assigments to your swap flag (which is why the tuned verision uses control statements rather than a flag to signal that a swap occured).

Using the XOR hack will probably cost more too in this case, as you're doing 6 array element accesses rather than the 3 that you need if you store the previous data value in a variable or the 4 that you need if you use a local temp variable. The XOR hack only offers performance benefits on swapping register values on machines with few registers and no swap instructions, but it's again probably not the right way here (though I haven't measured it in this case and quite how HotSpot will deal with these things can be unpredictable) .

Pete

pm_kirkhama at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 13
Sorry for asking, but what from what I understand my slowness comes from the assignment to true of the Swaped variable, doesn't it?Regards,gamehack
gamehacka at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...
# 14
Very probably.Pete
pm_kirkhama at 2007-7-15 16:27:44 > top of Java-index,Other Topics,Algorithms...