Matching ordered points to minimze distances

Hi folks.

I have to solve the following problem:

Given two sets P and Q of points in R^2 with |P| < |Q| find for all p in P the pairs (p, q) in P x Q, such that the sum of the distances d(p,q) is minimized.

This could probably be solved with a matching or flow algorithm, but I have a rather nasty additional restriction:

Let P and Q be ordered, i.e. P = {p_1,...,p_n} and Q = {q_1,...,q_m}, m > n.

Given two pairs (p_i,q_j) and (p_r,q_s) then for i > r, I need j > s.

My boss strongly suspects the problem to be NP-hard, so I am not interested in a optimal solution, but in a fast heuristic for a good or at least acceptable result, i.e. an algorithm that will not produce arbitrarily bad results.

Does anybody know this particular problem and a good strategy to solve it?

[834 byte] By [thomas.behra] at [2007-10-1 22:51:11]
# 1
What is your distance measure and how do you compare points in order to sort them?
parza at 2007-7-13 14:56:42 > top of Java-index,Other Topics,Algorithms...
# 2

> What is your distance measure

Currently, I use the standard euclidean distance d(p,q) = sqrt( (p_1-q_1)^2 + (p_2-q_2)^2 ) for points p=(p_1,p_2)^T, q=(q_1,q_2)^T in R^2.

I think the problem is independant of the applied metric, 'though I'm not required to implement a metric-independant solution.

> and how do you compare points in order to sort them?

I don't. I get two lists of points as input. The ordering of the points is implied by their position in the lists.

thomas.behra at 2007-7-13 14:56:42 > top of Java-index,Other Topics,Algorithms...
# 3

Since you do not have any additional information on the sorting,

my guess is also that it is NP-complete.

If you manage to find an polynomial algorithm that maps every

instance of the Minimum Point-To-Point Connection (see link below)

to at least one instance of your problem, then you have proven it

NP-complete as well as found an algorithm to approximate it with.

http://www.nada.kth.se/~viggo/wwwcompendium/node65.html

Please post what you find out on this problem to this thread.

I am curious.

parza at 2007-7-13 14:56:42 > top of Java-index,Other Topics,Algorithms...
# 4
Thanks, I'll look into it.
thomas.behra at 2007-7-13 14:56:42 > top of Java-index,Other Topics,Algorithms...
# 5
I think I know how to solve it in O(n^2) given thatn~|P|~|Q| and computing the distance metric is constant.Before I spend more time on this writing the idea downI would like to know if you are still reading this topic.
parza at 2007-7-13 14:56:42 > top of Java-index,Other Topics,Algorithms...
# 6

This is a fairly standard dynamic programming optimization problem. That means that the optimal solution will require N*M space and about that in runtime. So if your lists are each not much longer than a thousand points or so a perfect solution is tractable.

Without any information about the nature of the data sets it will be difficult to make any claims for a heruistic solution. Meaning only that it is trivial to construct data sets that match perfectly for one single specific alignment of the two lists and match arbitrarily poorly for any other alignment.

I am happy to outline the DP code, but if you are not familiar with dynamic programming it will make little sense and of course it also makes no sense to outline this method if the number of points you must deal with is too large for this technique.

marlin314a at 2007-7-13 14:56:42 > top of Java-index,Other Topics,Algorithms...
# 7

I do not have any experience with DP, but I know the

basic idea. If I am not mistaken there are at least

three ways to solve this using DP which gives different

conditions and results.

1. Requires k*|P|*|Q| memory and N(n^2) runtime.

Computes the pairs and the minimum sum.

2. Requires k1*|P|+k2*|Q| memory and N(n^2) runtime.

Computes the sum.

3. Requires k1*|P|+k2*|Q| memory and N(n^3) runtime.

Computes the pairs and the minimum sum.

parza at 2007-7-13 14:56:43 > top of Java-index,Other Topics,Algorithms...
# 8

> but I have a rather nasty additional restriction:

> Let P and Q be ordered, i.e. P = {p_1,...,p_n} and Q = {q_1,...,q_m}, m > n.

> Given two pairs (p_i,q_j) and (p_r,q_s) then for i > r, I need j > s.

I don't understand this restriction: you said the ordering was implied

by the position of the points in the list. Obviously it can't be just *any*

pair (p_i,q_j). Does that pair have a minimal distance?

kind regards,

Jos

ps. if you relax the restriction (which I don't understand), the problem

isn't in NPC, i.e. it's a simple O(m*n) problem at worst.

JosAHa at 2007-7-13 14:56:43 > top of Java-index,Other Topics,Algorithms...
# 9

> I don't understand this restriction

Construct an matrix with |P| rows and |Q| columns where the

values in the matrix are the distances between all possible

pairs. The rows and columns should be sorted as the lists

given. Assume that the pair (p_i, q_j) has been chosen, plot

this pair in the matrix and see how this restricts other

possible pairs. This should make it easier to understand the

restriction.

parza at 2007-7-13 14:56:43 > top of Java-index,Other Topics,Algorithms...
# 10

> I would like to know if you are still reading this topic.

Yes, I am. ;-)

> I don't understand this restriction: you said the ordering was implied

> by the position of the points in the list. Obviously it can't be just *any*

> pair (p_i,q_j). Does that pair have a minimal distance?

The restriction says, that if you draw your solution as bipartite graph G = ( P+Q, E = {pairs in the solution} ), then there are no edge crossings:

p_2

\

p_1 \

\ q_2

\

q_1

is acceptable where as

p_2

|

p_1 -+- q_2

|

q_1 is not acceptable.

thomas.behra at 2007-7-13 14:56:43 > top of Java-index,Other Topics,Algorithms...
# 11

> > I don't understand this restriction

>

> The restriction says, that if you draw your solution

> as bipartite graph G = ( P+Q, E = {pairs in the

> solution} ), then there are no edge crossings: [ nice ASCII art snipped ]

Ah, ok, I'm with you again. It you want to minimise the sum of all

chosen distances: sum(p_i <--> q_j) where |P| = n < m = |Q|, with

no edge crossing and if the order of the p_i and q_j is fixed, this is

quite easy to solve, because all you have to do is find all combinations

C(m, n) and pick the one with the minimum distance value from them.

kind regards,

Jos

JosAHa at 2007-7-13 14:56:43 > top of Java-index,Other Topics,Algorithms...
# 12

> The restriction says, that if you draw your solution as bipartite

> graph G = ( P+Q, E = {pairs in the solution} ), then there are no

> edge crossings:

I understood your first definition, but now I am lost. What is the

definition of 'edge crossing' in graph theory?

If I assume that the first definition is right, then there are at

least three different solutions with different results and consumption.

Is any of following solutions acceptable?

1. Requires k*|P|*|Q| memory and N(n^2) runtime.

Computes the pairs and the minimum sum.

2. Requires k1*|P|+k2*|Q| memory and N(n^2) runtime.

Computes the sum.

3. Requires k1*|P|+k2*|Q| memory and N(n^3) runtime.

Computes the pairs and the minimum sum.

parza at 2007-7-13 14:56:44 > top of Java-index,Other Topics,Algorithms...
# 13

As I said, it is a dynamic programming optimization problem.

The problem of aligning the points works like this:

assume you have p[] and q[]. You write the recursive routine that returns the minimum distance. It uses standard divide and conquer.

od(int ip,int iq){

// returns total distance of optimum match in the specific case

// where point ip is matched with iq or something less than iq

// first we assume ip matches iq and thus ip-1 must match something

// equal to iq-1 or less

best = p[ip].dist(q[iq]) + od(ip-1, iq-1);

// then we look at all other possibilities for the q point

for(int i = ip; i < iq; i++{ // note - q index always at least as large as p index

better = p[ip].dist(q) + od(ip-1, i-1);

if(better < best) best = better;

}

return best;

}

The recursive routine is the solution. If you run it od(n,m) it will tell you the total distance of the best match. It has only two problems 1) it is slow because you are continually reworking sub-problems and 2) while it returns the value of the optimum match it does not ever store and return the matching decisions that were made in the process of discovering that optimum.

Dynamic programming is the fix for problem 1. Read a chapter of an algorithms book or find something on the web about it. The basic idea is to rewrite od so that it will cache the result of every call to od(ip,iq) in an array and to also short cut the recursion in od by first looking in the array to see if the result has already been calculated.

The fix to the second problem is to also cache int that array the single last iq value that led to the optimal alignment for od(ip,iq) , so that when you are all done you can backtrack throught the array and see what result you got at each step. This should also be explained in any write up of dynamic programming.

I would just send you the entire solution, but it would not make sense and you would not be able to either tweek it or debug it if you don't understand DP. So after you have educated yourself on this very useful optimization technique, and have tried transforming the code which I have included if you are still having problems transforming my partial solution into the complete solution just ask for help.

marlin314a at 2007-7-13 14:56:44 > top of Java-index,Other Topics,Algorithms...
# 14

Sorry - forgot the code tags and some data typing.

double od(int ip,int iq){

// returns total distance of optimum match in the specific case

// where point ip is matched with iq or something less than iq

// first we assume ip matches iq and thus ip-1 must match something

// equal to iq-1 or less

double best = p[ip].dist(q[iq]) + od(ip-1, iq-1);

// then we look at all other possibilities for the q point

for(int i = ip; i < iq; i++{ // note - q index always at least as large as p index

double better = p[ip].dist(q[i]) + od(ip-1, i-1);

if(better < best) best = better;

}

return best;

}

marlin314a at 2007-7-13 14:56:44 > top of Java-index,Other Topics,Algorithms...
# 15

I agree with marlin314 that posting code will confuse more

then help. However, I would like to add that there are at

least three DP versions to consider. I think that option 4

is the most interesting solution. marlin314, you seem to have

DP experience, do you think it will perform as I have

predicted?

Option 1:

Following link is a simple tutorial of something similar to

what you are trying to do.

http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dynamic.html

The algorithm above transfered to your problem will consume

O(|P|*|Q|) memory and have a runtime of O(|P|*|Q|).

However, there are at least three other options.

Option 2:

If you only need the minimum sum, you can reduce memory

consumption to O(|P|+k*|Q|) by skipping the traceback.

Option 3:

The third option gives the sum as well as the pairs and it

consumes only O(|P|+k*|Q|) memory but it has a runtime of

O(|P|^2*|Q|). The idea here is to replace the traceback with

computing the last pair without saving the path and then

reduce the problem by removing this pair and then compute

the last pair again, and so on.

Option 4:

Let k be a constant. While recursively moving through P, save

the k smallest sums at every p in P. Then when doing the

traceback, allow only the pairs which were saved, hoping that

the sum will add up to the minimum sum. If the sums are the

same, its done, else increase k and start over. The best

possible performance is O(|P|+k*|Q|) memory and a runtime of

O(|P|*|Q|). The worst performance is O(|P|*|Q|) memory and

a runtime of O(|P|^2*|Q|). The in data will decide performance,

but I believe that for most data it will be much closer to the

lower limit.

parza at 2007-7-20 12:23:51 > top of Java-index,Other Topics,Algorithms...
# 16
Thank you all for your help on this.I will take your advice and read up on DP.
thomas.behra at 2007-7-20 12:23:51 > top of Java-index,Other Topics,Algorithms...