Best algorithm to sort semi-sorted arrays?

I am trying my hand at the hutter prize (http://prize.hutter1.net/)

My current compression algorithm sorts an array of Comparable Objects every main cycle.

-This array of objects can potentically increase by one object per cycle.

-Each object's comparision value can change per cycle, however its order is mostly kept.

This means that the over all order of the objects are similar between cycles.

Is there an sorting algorithm which can take advantage of the semi-sorted state of the array?

I was using the default Arrays.sort() however it does not seem to make much use of the semi-sorted input array and is too slow for my use.

In order to speed up the sorting i have relaxed the need to be in strict order. I.e there can be some objects out of order.

To this end I have developed a BucketSort which i thought should speed up the sorting however it seems to have the opposite effect!

Is there any reason why this implementation is quite slow? (three times as slow as Array.sort() )

privatestaticint MAX_BUCKET=65535;

privatestatic Word[] buckets=new Word[MAX_BUCKET];

publicstaticvoid bucketSortReverse (Word[] a,int n)

{

int i;

int count=1;

int index;

Word curr;

// fill up the buckets with the input words

for (i=0;i<n;i++)

{

curr=a[i];

index=(int) (curr.probablility*MAX_BUCKET);

curr.next=buckets[index];

buckets[index]=curr;

}

// iterate over the buckets and construct the sorted words

for (i=0;i<MAX_BUCKET;i++)

{

curr=buckets[i];

if (curr!=null)

{

while (curr!=null)

{

a[n-count++]=curr;

curr=curr.next;

}

buckets[i]=null;

}

}

}

*NOTE* Currently my input Word array contains a maximum of 6000 words.

Lowering the number of bins reduces the accuraccy of the sort and does not provide much increase in speed :(

I have found some Open Source code which by default performs very similarly to Arrays.sort in terms of speed and have modified it to be non-strict in its ordering.

This modified sorting algothim is approximately 15% faster but is still really not fast enough.

By reducing the strict ordering of the sorting algorithm i have also reduced the level of compression. Ultimately i would like to have a fast sorting algothrim which takes into acount semi-sorted objects with out reducing strictness of the sort.

Any ideas?>

[3551 byte] By [klana001a] at [2007-11-27 2:55:37]
# 1

> compression algorithm

?

Wouldn't you want something along the lines of insertion sort? Quick sort (e.g. Arrays.sort()) should also be pretty good assuming it picks a good pivot point.

Overall, it sounds like you should be using a tree structure. When an element changes, remove it from the tree, change the value, then re-add it to the tree.

rkippena at 2007-7-12 3:32:32 > top of Java-index,Other Topics,Algorithms...
# 2
> *NOTE* Currently my input Word array contains a maximum of 6000 words.You might also want to look at using a Trie structure if you're dealing with words that share common prefixes.
rkippena at 2007-7-12 3:32:32 > top of Java-index,Other Topics,Algorithms...
# 3
Thanks for the suggestion.would a tree sort be efficient as all the elements will change however the order of these elements will differ only by a small amount.perhaps some kind of test of the parent may need to be performed to determine whether a change is really needed...
klana001a at 2007-7-12 3:32:32 > top of Java-index,Other Topics,Algorithms...
# 4

Tree removal takes O(logn) and insertion takes O(logn). Compare this to an array where potentially a large number of elements need to be shifted O(n).

Usually it takes hundreds of thousands of elements to notice a difference. So if your data is relatively small, then it may not be worth trying to optimize it.

> test of the parent may need to be performed to determine

The balanced binary tree (TreeSet) that comes with the JDK doesn't allow access to the internal structure of the tree, so testing the parent isn't available. Even if it was, it would still take an O(logn) lookup to find the node and get its parent, so it wouldn't be a significant improvement.

rkippena at 2007-7-12 3:32:32 > top of Java-index,Other Topics,Algorithms...
# 5
hmm...thanks for the opinon... I think i might have to make my own tree to make a speed comparision. Or derive from the java's TreeMap.
klana001a at 2007-7-12 3:32:32 > top of Java-index,Other Topics,Algorithms...
# 6

[quote]

Even if it was, it would still take an O(logn) lookup to find the node and get its parent, so it wouldn't be a significant improvement.

[/quote]

From looking at the decompiled TreeMap it looks like the parent for each node in the tree is recorded. so theoretically i should be able to record the parent "element" for each element to be sorted.

I would be able to then compare the current element to the parent's element with out having to search the tree.

klana001a at 2007-7-12 3:32:32 > top of Java-index,Other Topics,Algorithms...