sorting with TreeMap vs. quicksort
I've been sorting lists by throwing them into a TreeMap then retrieving them in order (especially for fairly evenly distributed lists of strings), thereby leveraging its binary tree characteristics.
My belief is that sorting this way takes N log N time, just as quicksort (as in Arrays.sort) does -- because adding items to a TreeMap takes log N time -- without any danger of stack overflows from recursing on a huge list as with quicksort, and fewer method calls which can get expensive.
I'm figuring that even if this way of sorting uses more memory than an in-place sort as in Arrays.sort, at least it won't overflow the stack, and will mostly be eligible for garbage collection anyway.
Could someone correct or confirm my conclusions, reasoning or assumptions? And how well do they apply to Collections.sort's merge sort?
[854 byte] By [
ahaweba] at [2007-9-29 7:49:50]

Collections.sort is a merge sort and will run in O(n log n) guarenteed.
This is an upper bound on its worst performance. It does not perform
an in-place sort so its memeory requirements are larger then (for example)
a quicksort.
Using a tree guarentees worst case O(n log n) for n inserts. The
iteration to retrive the sorted elements is O(n). The memory requirements
are not that different then copying the list (as mergesort does). As
for stack problems these are unlikely to occur until logn is at least
a few thousand (think about stupidly large values of n).
Another option is to use one of the Arrays.sort methods. These perform
an in place (modified) quicksort. This means no extra memory is required.
Its performance is O(nlogn) for most datasets but its gotcha is that
performance can degrade well below this approaching O(n^2) (might be
higher. I cant remember).
Theres your options. Make your choice.
matfud
While using a binary tree to sort does take O(n log n) time, the constant multiplier involved is larger than that of a built-in sort method. Java's sort methods have been optimized by Sun, or perhaps even written partly in native code.
Mateia at 2007-7-14 21:39:06 >

> Using a tree guarentees worst case O(n log n) for n
> inserts.
Amazing, my untrained intuition was correct.
> As for stack problems these are unlikely to occur until
> logn is at least a few thousand (think about stupidly large values of
> n).
I regularly need to sort lists of tends of thousands of strings.
> .. its [quicksort's] gotcha is that
> performance can degrade well below this approaching
> O(n^2) (might be higher. I cant remember).
Yes, O(n^2) for adversarial data. Does mergesort require a shallower recursion than quicksort does?
Since temporary use of the heap is rarely a concern for me, and datasets are very large, it sounds like mergesort and the TreeMap are viable contenders for preferred sorting methods. But I'm still apprehensive about mergesort's use of recursion.
Thank you, I gave you 5 duke dollars and matei 2.
Some additional comments
Trees:
The O(n log n) for tree inserts is only guarenteed as java's trees (in the util package) use balanced trees (red-black). Binary trees in general
do not have this guarentee. Worst case insert/get from an unbalanced
tree is n (by recursion normally (can easily exceed stack size))
Mergesort:
Merge sort if implemented recursively should never exceed O(log n)
depth.
Quicksort:
The quicksort algorithm implemented in java is actaully normally faster then
all of the other methods (hence the name). Its actually a modified quicksort
that checks for some common worst case senarios before it starts work.
Its therefore significantly less likey then a vanila quicksort to hit
its upper bound on complexity (O(n^2)).
matfud
cheers for the dukes