Find all nearby points (defined as latitude and longitude)

I have a database containing a large number of points, theoretically anywhere on the earth's surface (defined as latitude and longitude). I need to be able to find all points that fall within a small circle, arbitrarily positioned anywhere on the earth - that is to say find all points within 5km of position x for example. The circle will always have a radius of 50km or less (hence curving of the earth's surface will not affect results and a 'flat-earth' model can be used). The search must be very efficient (musn't use sin, cosine, square root, etc) as it will be performed many times for different positions x.

My logic so far is that speed is important and accuracy over large distances is not important (other than to verify that two points are not close to each other) so I should be able to plot all my existing data on a flat x,y grid, using some kind of projection (a one-time cost). Then I can do a simple 2d calculation when searching. The following needs to be true for this to work:

1. the vast majority of countries need to be in contiguous co-ordinate spaces (i.e. no good if England has a split down the middle such that co-ordinates from different side of the country appear to be thousands of miles away.

2. the scale should be constant over the whole grid - so that two points x,y and x+a,y+b (where and a and b are relatively small arbitrary values) are the same physical distance apart on the earth's surface regardless of the value of x and y (x and y will always fall within a land-mass).

After visiting lots of sites about map projections I am no closer to understanding exactly what I need - so if someone could answer whether I am approaching this the right way, if so what kind of projection I need and details of how to convert to it I would be most grateful. (Also I realise this isn't a java question, but couldn't find any obvious map forum to post such a question to and am writing the app in java!)

I think I may want the general shape of projection indicated by this link (the major land-masses are contgious, but it's only a guess):

http://www.megaconference.org/megaconference4/images/Megacon4WorldMap.jpg

[2197 byte] By [Andrew-Ca] at [2007-10-1 21:52:20]
# 1

It strikes me that the easiest way to meet your requirements is to actually use the sphere, and represent your points in 3D Euclidean space rather than 2D spherical space. It will require some trig, but nearly all of it will be pre-processing. Each search will only require you to convert one (lat, long) pair into (x, y, z), which is about three trig operations. You'll also be saved worrying about wrapping points near the boundary round. It's worth remembering the principle of KISS.

YAT_Archivista at 2007-7-13 7:55:39 > top of Java-index,Other Topics,Algorithms...
# 2

What is the total number of points?

Is the density of points distribution uniform? How many points may fall in that 5km radius?

How much memory do you have for preprocessing data?

Do you process the search in arbitrary points of surface, or search in predefined sequence of points? If it's sequence, what part of surface fo you scan in whole sequence?

Answers to these questions may radically affect choice of algorithm.

RoboTacta at 2007-7-13 7:55:39 > top of Java-index,Other Topics,Algorithms...
# 3

I take your point regarding 3d calculations and am considering it but I can see in my mind that it should be possible using a 2d space - I still prefer this idea because after the more complex pre-processing, calculations in the system will be simpler (quicker, less error-prone, etc.) and I really don't need 3d co-ords. I am particularly concerned about speed of searching because this search engine will be driven by web-page interaction, and many searches may take place at the same time - responsiveness is very important. Also this will form only one of many criteria of which a search may be composed.

The total number of points should be no more than 100,000.

The distribution of points is not uniform - they will be clustered around centres of population.

For the most populated 5km circle there would not be more than 500 points, but normally I would expect many less points, up to 50.

Memory for pre-processing should not be an issue (a server with several gbs) - however I need a system that can have points added to it over time as a result of a web-page interaction. It would not be desireable for this to take a long time, nor for it to have to re-evaluate all existing points.

The search is currently done in an arbitrary order.

Andrew-Ca at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 4

One 2D space would be the complex plane if your sphere is spherical (and not oblate like the real world), then you can use the Reimann sphere and map the circle on its surface to its image, a circle on the complex plane, where (if the image of the circle does not encircle the origin, ie the circle does not encompass a pole of the sphere), the tangents from the origin to the image of the circle on the complex plane are the lines of longitude that are tangents to the circle on the sphere, giving min and max longitudes of a bounding box that circumscribes the circle (min and max latitudes are easy to determine). Or you can do the same in spherical trig, which makes the arithemtic harder though the maths is easier.

So you can narrow the search space to a range of lat and long, but you won't get a trig-free solution, as your lat and long are polar mag and arg, and you'll still need to convert to 2D cartesian to test the whether the points found in the bounding box are in the desired circle (the image of the point being inside the image of the circle on the complex plane implies the point in 3D being inside the circle on the Reimann sphere).

For an oblate sphere like the one we're living on, you're stuck with either lots of trig or precalculating the cartesian position of each point.

Pete

pm_kirkhama at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 5

If you think of Latitude as Y and Longitude as X, in a small neighborhood around a given point, these may be treated a Euclidean coordinates, except for the fact that the scale of the longitude portion (number of miles per degree) keeps changing as you move up in latitude.

So if you preprocess the longitude numbers, leaving them the same at the equator and shrinking them as you get to the poles, you will get the preprocess that you desire and can do a simple euclidean distance function and be approximately correct. The preprocess, assuming that latitude is a number from 0 to 90 degrees and longitude is a number from 0 to 360 degrees, is simply

longitude = longitude*cos(latitude)

Once your database points are pre-shrunk, when you get a target point you must pre-shrink it the same way before you do the distance calculation i.e.

target.longitude = target.longitude * cos(target.latitude);

Yes, this is one trig function, you do it once to preprocess the single target point, you then do a jillion distance calculation with no trig.

The map, if you drew it would look a bit like an issocles triangle with its base running up and down from pole to pole, with the apex being the equator pointing to the right. (actually it will be shaped more like a lobe of a cosine wave but a triangle is adequate for descriptive purposes)

As for avoiding the wrap around, this only happens if a country happens to land right on the transition from 360 degrees to zero. (or from -180 degrees to plus 180 degrees - I don't actually know anything about how latitudes are recorded) in any case, this can be solved by altering the latitude before you pre-shrink (if a country is mostly up around 360 but a little bit around 0, simply add 360 to the coordinates that are near 0.)

All you are doing is distorting the base of the triangle (the 0 longitude line running from pole to pole) so that if it cuts across a country, you will push the part of the map that was near the base over next to the part of the country that was out on the cosine wave side of the map.

marlin314a at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 6
oops - I mean sine not cosine. :0
marlin314a at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 7

Divide surface in squares (in spheric coordinates) and store points within 2d array, each cell corresponding to square. Cells within populated areas also divide in number of cells (with side not smaller then 5km). When certain are is requested, you can very quick select cells intersecting with search area.

Within each cell support sorted order on points along both axes. Convert search area to rectangle (or number of rectangles) and select points within that rectangle using sorted order. Finally remove points outside search area.

RoboTacta at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 8

I think following solves your problem and it is a fast and

simple algorithm. If the points are quite random, it will

find the set of points within a given radius in O(log(n))

average runtime with low constants. This is because it only

needs a binary search and some adding and multiplying.

Definitions:

R = your desired radius

x = pi*[earth radius]*[longitude in degrees]/180

y = pi*[earth radius]*[latitude in degrees]/180

P = be the set of all points you are considering

Given that the distance D between two points pi=(xi, yi)

and pj=(xj, yj) is a lot less then the radius of earth,

D^2 can be approximated with

D(pi, pj)^2 = (x1-x2)^2 + (y1-y2)^2 = x1^2 + x2^2 + y1^2 + y2^2 - 2(x1*x2 + y1*y2)

Problem to solve:

For a given point p = (x, y), compute the subset S of P

such that for all p' in S, D(p, p')<=R and for all

p'' in P\S, D(p, p'')>R.

Precomputation needed:

For every point pi in P, precompute xi^2 and yi^2 and put

them in a sorted list L (or table), sorting them on xi^2.

Make sure that every pair (xi^2, yi^2) in L is linked to

the actual point in order to identify the point later on.

Computation needed for every request.

1. Compute R^2, x^2, and y^2. Let S be an empty set.

2. Use binary search in L to find the index number k such that xk^2 <= x^2 <= x(k+1)^2. Set i = k and j = k+1.

3. if i<0 or |x^2 - xi^2| > R^2 then resume at 6.

4. if D(p, pi)^2 <= R^2 then add pi to S.

5. i = i - 1 and resume at 3.

6. if j>|P|-1 or |x^2 - xj^2| > R^2 then we are done.

7. if D(p, pj)^2 <= R^2 then add pj to S.

8. j = j + 1 and resume at 6.

Remarks

When computing D(p, pi)^2 above, reuse x^2 and y^2 in

L. The only computations needed to be done (except for

adding) are then x1*x2 and y1*y2.

You might run into problems if longitude is really close

to +-180 degrees or latitude is really close to +-90

degrees. If you have any points close to these you can

solve this problem by adding more then one reference to

a point into the list L, exceeding 180 or 90 degrees.

If you have any questions, do not hesitate to ask and If you

choose this solution, please let me know how it turned out.

parza at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 9

You can't binary search on random data, and your distance metric only approximates the real distance near the equator. If you're as far north as Paris you'd get an ellipse (though you can always divide the long difference by cos lat to get the right value). If you are precomputing the co-ordinates, do it in 3D - the cost is little, as you only need to do it when the point is added, rather than for each search.

Even if you cache the ^2 terms, x1^2 + x2^2 + y1^2 + y2^2 - 2(x1*x2 + y1*y2) is 5 additions, 3 products, one subtraction, whereas (x1-x2)^2 + (y1-y2)^2 is 1 addition, 2 products, two subtractions, so will be faster (unless your machine takes longer to perform a subtraction than it does 4 additions and a product).

Pete

pm_kirkhama at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 10

> You can't binary search on random data.

Under 'Precomputation needed:' it says that L is sorted

on xi^2.

> ... and your distance metric only approximates the

> real distance near the equator.

According to the x and y under 'Definitions:' they are

not the x,y,z found in the ordinary 3D coordinates.

Computing x1-x2 gives the distance between p1 and p2

looking only at the longitude. Thus, they are only used

to compare distances locally and all large errors are

canceling each other. There is one small error left

though, but since R ~ 5 km the distance error will be

about sqrt(R_earth^2+R^2)-R_earth ~ 2 meter, which is

acceptable.

> If you're as far north as Paris you'd get an ellipse

> (though you can always divide the long difference by

> cos lat to get the right value).

I think you have misunderstood what x and y are. They

are defined as

x = pi*[earth radius]*[longitude in degrees]/180

y = pi*[earth radius]*[latitude in degrees]/180

which is the circular distance on the surface of earth.

> Even if you cache the ^2 terms, x1^2 + x2^2 + y1^2 +

> y2^2 - 2(x1*x2 + y1*y2) is 5 additions, 3 products,

> one subtraction, whereas (x1-x2)^2 + (y1-y2)^2 is 1

> addition, 2 products, two subtractions, so will be

> faster (unless your machine takes longer to perform a

> subtraction than it does 4 additions and a product).

You are right. I assumed without counting and as usually

assumption is the mother of all screwups. :) However, we

must create the list L to be able to do a binary search.

Create this list L using pairs (xi, yi) instead of

(xi^2, yi^2) will improve performance a bit.

I want to underline again that the algorithm is really

fast since the number of points to be computed is on

average 100000*2*R/(2*pi*[earth radius]) =

100000*2*5/(2*pi*6340) ~ 25 points given that we have

100000 points in total.

parza at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 11

In order to make the algorithm slightly faster I have

made a change suggested by pm_kirkham.

Definitions:

R = your desired radius

x = pi*[earth radius]*[longitude in degrees]/180

y = pi*[earth radius]*[latitude in degrees]/180

P = be the set of all points you are considering

Given that the distance D between two points pi=(xi, yi)

and pj=(xj, yj) is a lot less then the radius of earth,

D^2 can be approximated with

D(pi, pj)^2 = (x1-x2)^2 + (y1-y2)^2

Problem to solve:

For a given point p = (x, y), compute the subset S of P

such that for all p' in S, D(p, p')<=R and for all

p'' in P\S, D(p, p'')>R.

Precomputation needed:

Let L be a sorted list (or table) containing the points

in P such that for any pi,pj in L, pi comes before pj

iff xi<xj. Make sure that every pair (xi, yi) in L is

linked to the actual point in order to identify the

point later on.

Computation needed for every request:

1. Compute R^2. Let S be an empty set.

2. Use binary search in L to find the index number k such that xk ><= x <= x(k+1). Set i = k and j = k+1.

3. if i<0 or |x - xi| > R then resume at 6.

4. if D(p, pi)^2 <= R^2 then add pi to S.

5. i = i - 1 and resume at 3.

6. if j>|P|-1 or |x - xj| > R then we are done.

7. if D(p, pj)^2 <= R^2 then add pj to S.

8. j = j + 1 and resume at 6.

Remarks

You might run into problems if longitude is really close

to +-180 degrees or latitude is really close to +-90

degrees. You can solve this by search twice and add the

point found. Here is an example.

Let longitude=179.9 degrees and latitude=10 degrees. First

compute S1 for this values and then compute S2 for the

values longitude=179.9-360=-180.1 and latitude=10. Then

add the points in S1 and S2.

parza at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 12

> Under 'Precomputation needed:' it says that L is sorted on xi^2.

But not on y, so you will not be able to binary search using your distance metric, and there are points inside the circle with longitude greater than the longitude of the intersection of the circle and the circle of latitude through its centre, so you will need some trig to determine the bounds of x required. OTOH it may be that the distance is good enough, as the great-circle distance over the radius may be equal to distance along the circle of latitude to within the desired error.

> > ... and your distance metric only approximates the real distance near the equator.

> I think you have misunderstood what x and y are. They are defined as

> x = pi*[earth radius]*[longitude in degrees]/180

> y = pi*[earth radius]*[latitude in degrees]/180

> which is the circular distance on the surface of earth.

No, a first approximation of the distance on the surface of the earth is (angles in radians):

y = [earth radius]*[latitude]

x = [radius of circle of latitude]*[longitude]

where

[radius of circle of latitude] = cosine(latitude) * [earth radius]

Think how much a change of longitude effects the distance on the surface of the earth near a pole.

Hence my statement that your distance, which as stated is the distance on a cylindrical projection touching the equator, is inaccurate away from the equator, ie where 1 is not a good approximation for the cosine of the latitude.

It doesn't effect whether the algorithm is fast, only whether it gives the right results, which it would if you adjusted it to use a better distance metric (you can also precompute two values of x and y to cope with wrapping around the antimeridian) and suitable limits for your initial longitudinal bounds.

Pete

pm_kirkhama at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 13

I stand corrected. Thank you pm_kirkham for pointing

this out.

Ok, this version is based on the same idea and it has

the same speed. Moreover, it is not limited to small

R's. Any R can be used. If there is any mistakes in

this one, I am counting on pm_kirkham to point them

out. :)

Obs, the binary search is meant to be done only on the

latitude, NOT on the distance metric.

Definitions:

R = your desired radius

a = [latitude in radians]

b = [longitude in radians]

P = be the set of all points you are considering

R_e = be radius of earth

The distance D between two points pi=(ai, bi) and pj=(aj, bj)

is then

D(pi, pj) = arccos(cos(ai)*cos(bi)*cos(aj)*cos(bj) + cos(ai)*sin(bi)*cos(aj)*sin(bj) + sin(ai)*sin(aj))*R_e

Problem to solve:

For a given point p = (a, b), compute the subset S of P such

that for all p' in S, D(p, p')<=R and for all p'' in P\S,

D(p, p'')>R.

Precomputation needed:

For every point p in P compute the set of values

V(p)= {cos(ai)*cos(bi), cos(ai)*sin(bi), sin(ai)} and order

all these sets in a list L such that V(pi) comes before V(pj)

iff ai<aj. Now, for any two points pi,pj in P we can compute

cos(D(pi, pj)/R_e) by three multiplications and two additions.

Computation needed for every request:

1. Compute a_R = R/R_e and car = cos(a_R). Let S be an empty set.

2. Use binary search in L to find the index number k such that ak ><= a <= a(k+1). Set i = k and j = k+1.

3. if i<0 or |a - ai| > a_R then resume at 6.

4. if cos(D(p, pi)/R_e) >= car then add pi to S.

5. i = i - 1 and resume at 3.

6. if j>|P|-1 or |a - aj| > a_R then we are done.

7. if cos(D(p, pj)/R_e) >= car then add pj to S.

8. j = j + 1 and resume at 6.

parza at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 14
It is possible to solve this problem faster by dividing P up into boxes which has been stated above. Given 100 000 points and an ordinary CPU, the method I described above can easy compute 100 000 requests per second, so why make it more complicated.
parza at 2007-7-13 7:55:40 > top of Java-index,Other Topics,Algorithms...
# 15

Hi,

I think I have finally understood the various solutions offered! I must admit that the whole area of indexing was something I had not previously considered, and due to the nature of the search (the results of one part of a search feed into another) it's not trivial to insert indexing. For the time being I have implemented the solution suggested by marlin134 (actually the formula does seem to be longitude = longitude*cos(latitude) and not sin()...) - this gives me the ability to do a simple 2d comparison as I originally requested and could also I suspect be modifed to use an index. I have made a note for the next release regarding adding indexing to many types of searches.

I would like to say thanks to everyone for your help and also how impressed (overwhelmed actually!) I have been with the responses.

Andrew-Ca at 2007-7-20 12:01:06 > top of Java-index,Other Topics,Algorithms...