Fortune's Sweepline Algorithm implementation?
I've got a business application that needs to run a Nearest Neighbor
algorithm on a set of about 6000 points. These points describe
a path in 2-space. The path is wrapped in a bounding box. For
each (int x, int y) point inside the bounding box, I need to locate the
nearest point on the path. I'm regularly looking at over 20,000
points that need to find a nearest neigbor along my path.
My running time with an R-Tree can be as bad as 5 minutes,
which is way too slow.
My Chief Scientist has suggested to me that a Voronoi diagram was
the way to go, and that the best algorithm to employ was Steve
Fortune's sweepline algorithm. The only implementation I was
able to find was written in C by Fortune himself. Is this really the
only implementation of his algorithm out there? Alternately,
can you suggest other implemenations that will generate a
Voronoi diagram written in Java?
Thanks for any ideas you might have,
Eric
[1032 byte] By [
es5f2000a] at [2007-10-1 18:21:48]

I can't recommend too highly Computational Geometry by M. deBerg et al.
http://www.amazon.com/exec/obidos/tg/detail/-/3540656200/qid=1121220941/sr=8-1/ref=pd_bbs_ur_1/002-0188184-6631238?v=glance&s=books&n=507846
It covers several different flavors of the plane sweep algorithm, of which Fortune's algorithm is a variant. Also it has a very nice pedagogical feel to it, mixing practical examples with rigorous mathematics.
You will need a priority queue and a balanced search tree that supports the "nearest" operation. The first is provided by java.util.TreeSet, the second is a little more difficult. You could use a TreeSet, but I think it would require multiple calls to headSet() and tailSet() and first() and last(). It may be faster/easier to write your own red-black or avl tree implementation (or look for a 3rd party impl.)
I'm sure you'll be able to find some pseudocode for the full algorithm on the net, if you need any help, just ask...
Good luck.
are the 6000 points on the path fixed - always the same? Are the 20,000 points in the bounding box reasonably uniformly distributed? Are the 20,000 changing all the time? What is variable and what is fixed? Does the path wander all through the bounding box or is it nearly straight. Do you have to get the exact nearest neighbor every single time or it ok to be off by a little bit, like get the next nearest neighbor, very rarely?
The trouble with voronoi diagrams is that they are complicated and almost never needed. You can almost always solve a nearest neighbor problem with a little judicious gridding. i.e. subdivide the space by a grid that contains only a few target points per cell and then brute search within the appropriate cell and its neighbors.
The problems with gridding arise when your target set of points are in no way uniformly distributed and you are left with cells in your grid that are not only vacant, but are also surrounded by vacant cells. There is some likely hood that one of these "distant" cells lies completly within a single voronoi domain, but of course this need not be the case.
If you can live with an approximate result in these places where you are "far away" from the action, you just "wrap" the voronoi boundries around those far cells and declare a cell to be "owned" by a particular point
In summary, the poor man's voronoi diagram works like this. You subdivide your bounding box into a 30 by 30 grid or something like that. This is 900 cells for approximately 6 path points per cell. You take the center point of each grid and in a pre-process pass you brute search the entire 6000 path points and find the true nearest neighbor for the center of each cell. Any cell that is entirely empty and surrounded by empty cells is delcared to be owned by the nearest neighbor of its center point. For any other cell. brute search the cell to find the nearest neighbor in cell. This is the candidate nearest neighbor and the distance to that point tells you how many neighbor cells, if any, you need to search as well.
Rough order of magnitude amount of work is this: If you needed to search a cell and all eight neighbor cells every time you are talking on the order of 50 distance calculations per tested point thus about one million calculations to do a batch of 20,000 test points. Should only take a few seconds.
Of course, change any of the numbers and tune to your particular situation.