Wierd number removal algorithm

Hey all, this is my first post here, so please bare with me. For the last week I have been tinkering with optimizing an algorithm to remove numbers from a randomly ordered input set and a base set such that after I remove them the two sets are identical. Currently I have a recursive solution that gives an optimal solution all the time, at the obvious cost of CPU time. Since I can not eat up those cycles all the time, I have mostly been experimenting with tuning an initial solution to both aid in pruning the recursive algorithm and, in the event that the recursion begins to take too long, the initial solution will be taken in favor of continuing the recursion.

Anyway, I've done a fair share of data analysis of thousands of random sets, written an output analysis utility to check how well my various algorithms are preforming, and just generally been going nuts this week. So, now that I have a solution that I was bound by time to come up with, I want to turn to a bigger crowd and see if there was something that I just missed completely which would allow me to solve this problem better. So, without further adieu, let me present the problem.

I have a randomly ordered problem set, we'll call it P. It contains exacly n non-repeating numbers in the inclusive range (1..n). (Every number in the range MUST exist in the set).

I have the comparision set, we'll call it C. It always contains the numbers in the range (1..n) in ascending order from 1 to n.

My solution set, we'll call it S, is the set of numbers that can be removed from both P and C such that P == C. I am trying to minimize the amount of members of this set.

Currently, my recursive algorithm works like this:

Base case: P == C, at which point I check S and if it is a better solution then I store it in my best solution, otherwise I discard it.

Pruning: When the current working solution gets bigger than the current best solution I prune. I also abort when too much time has expired.

Recursive case: Find the first discrepency between P and C and make a decision on whether to remove the discrepent number from P or C. This decision is made as follows: Given n is the member in both P and C (since they are always equal in length) that is discrepent: If P[n] > P[n+1] then remove P[n] from both P and C and recur (the other branch will be tried if that one fails). Otherwise remove C[n] and recur (again trying P[n] if that branch does not pan out).

This works pretty well, and its performance seems, in general, to be governed by the initial amount of discrepencies in the list. Of course, sometimes removing a single discrepency can clear up a lot more and so that is why I say generally.

Now, this works great, but again it takes time for recursive algorithms to work. Also, any initial solution that is used should be as best as possible, and there may even be a better recursive solution to this problem using possibly increasing/decreasing trends in the data, or offset from desired position information, etc. However, while I've tried some rudimentary forms of many of these ideas in an iterative non branching approach to be used as initial solutions, none of them preform as well as the recursion in all cases and some preform many folds worse. On average, they do pretty well, and with a random data set generated solutions within an average of 2 worse than the recursive solution did in much less the time.

At this point, however, I come to you and ask if anyone knows if there is already an algorithm that solves this type of problem, or if anyone knows something I missed or could try or research more of in order to tune this algorithm. I already have the analysis tool written and have abstracted the program to such a level that it is easy to compile in a new initial solution or recursive section and then run it through with an input set and then run the analysis tool on the output. So, I can swap algorithms relatively quickly, which is nice.

Anyway, the source is all C - and I know this is a Java forum, but you all seemed very knowledgable, so I figured you could help! If anyone wants to see what I've done, I'd be willing to share a bit more.

Thanks for reading, I'm sure it's been an effort, but one I really appreciate.

Brian Gregg

Computer Sciences Corp.

[4370 byte] By [DreamWarriora] at [2007-9-29 7:40:35]
# 1

My take is that you are trying to find the inner join (intersection)

of the two sets is that right?

if so

put all elements of set A into a HashSet (HashMap or LinkedHashMap

depending on preference/ease). Then run through all elements of set B

calling setA.contains(setb[index]);

something like

int[] setA = //numbers in here

int[] setB = // numbers in here

HashSet hashedSetA = new HashSet();

HashSet hashedResult = new HashSet();

// populate the HashSet

for(int i=0;i<setA.length;i++)

{

hashedSetA.add(new Integer(setA[i]));

}

// perform the test

for(int i=0;i<setB.length;i++)

{

Integer tmp = new Integer(setB[i]);

if(hashedSetA.contains(tmp))

{

hashedResult.add(tmp);

}

}

// the intersection is now in hashedResults

matfud

>

matfuda at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 2

Opps sorry I just realised you want the outer join (ie all elements

that are not in either setA or setB) not the inner join (the elements

common to both sets)

You can do this in a similar manner though.

int[] setA = //numbers in here

int[] setB = // numbers in here

HashSet hashedSetA = new HashSet();

HashSet hashedResult = new HashSet();

// populate hashedSetA

for(int i=0;i<setA.length;i++)

{

hashedSetA.add(new Integer(setA[i]));

}

//populate hashedSetB

for(int i=0;i<setB.length;i++)

{

hasedSetB.add(new Integer(setB[i]));

}

// perform the test

for(int i=0;i<setB.length;i++)

{

Integer tmp = new Integer(setB[i]);

// add result to hashedResult if it is not in setA

if(!hashedSetA.contains(tmp))

{

hashedResult.add(tmp);

}

}

// hasedResult now contains the elements that are in B but not in A

// perform the test

//perform the same kind of loop again to find elements in A but not B

for(int i=0;i<setA.length;i++)

{

Integer tmp = new Integer(setA[i]);

// add result to hashedResult if it is not in setA

if(!hashedSetB.contains(tmp))

{

hashedResult.add(tmp);

}

}

// hashed result now contains the elements that are in A not B and in

B not A

Note this is general purpose and not optimised but should run in

O(N+M) where N and M are the sizes of set A and B respectivley.

You seem to have additional constraints that could be used to further

optimise the algorithm

matfud>

matfuda at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 3

that should have been:

Note this is general purpose and not optimised but should run in

an expected O(N+M) where N and M are the sizes of set A and B

respectivley. (Hashes are tricky blighters inserts and searches

can take a very long time in the worst case but if you have a good

hashing function they avarage constant (and quite low) time.

matfud

matfuda at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 4

> My take is that you are trying to find the inner join

> (intersection)

> of the two sets is that right?

> Opps sorry I just realised you want the outer join (ie

> all elements

> that are not in either setA or setB) not the inner

> join (the elements

> common to both sets)

Hummm...maybe I didn't state the problem clearly? I want neither of those...but thanks for the help.

Both sets will always contain the same members, the numbers from 1..n inclusive. Neither will contain duplicates, set C will always contain them in increasing order, set P will contain them in a random order (possibly ascending).

Here's an example:

Suppose set P is { 2 3 5 1 6 4 }

it contains numbers from 1..6 inclusive, randomly ordered.

This means that set C MUST contain the numbers from 1..6 in ascending order, so set C is {1 2 3 4 5 6 }

Now I want a set S, such that when I remove (from both C and P) the numbers in set S, C == P. One such example would be S = {1 5 6} because after removing 1, 5, and 6 from both P and C we get:

P is {2 3 4 } and C is {2 3 4}

Hopefully that clears things up a bit?

DreamWarriora at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 5
Sorry popped out for some beer.That means that the answer must be ascending but not necessarily consecutive right?still thinkingmatfud
matfuda at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 6
> Sorry popped out for some beer.> > That means that the answer must be ascending but not> necessarily > consecutive right?> > still thinking> > matfudYes. I guess that is another way of putting it.
DreamWarriora at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 7
I also take it that you are looking for the longest such run of numbers?(or one of the set of longest runs?, or all of the sets of longest runs?)matfud
matfuda at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 8

> I also take it that you are looking for the longest

> such run of numbers?

> (or one of the set of longest runs?, or all of the

> sets of longest

> runs?)

>

> matfud

Well, it isn't quite as easy as the longest set of acending numbers, because by removing various numbers a longer set may be created. By removing certain combinations of numbers even longer still sets can be created.

However, in essence, after removal, yes I want the longest set of remainaing ascending numbers possible (or the minimum set of numbers removed).

DreamWarriora at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 9

Sorry still pondering on this but:

since the first set of numbers is ascending and consecutive that

seems to imply that you are looking for the longest set of ascending

(non-consectutive) numbers in the P set. (I know that this isn't the

acutal result you want but if found it is then trivial to find the set

of numbers removed (S) from sets C and P.

I am thinking in terms of absolute longest set ( remove as many numbers

as is required to find the longest set).

matfud

matfuda at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 10

> Sorry still pondering on this but:

>

> since the first set of numbers is ascending and

> consecutive that

> seems to imply that you are looking for the longest

> set of ascending

> (non-consectutive) numbers in the P set. (I know that

> this isn't the

> acutal result you want but if found it is then trivial

> to find the set

> of numbers removed (S) from sets C and P.

> I am thinking in terms of absolute longest set (

> remove as many numbers

> as is required to find the longest set).

>

> matfud

Very true. I see your point, the problem is that when you do a trend analysis on set P you'll see something like + + + - - + + - +, etc. And doing something like removing the three minus points either a. doesn't work, b. doesn't yield the best ascending sequence because one of those - points may have been capable of being left in.

I think the biggest problem I'm having is trying to find this longest ascending trend with all the possible decending trends thrown in the midst of them.

DreamWarriora at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 11
I did this a few months ago. http://forum.java.sun.com/thread.jsp?forum=31&thread=362717Just a quick hack, but it may help.
bbrittaa at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 12

If that recursive solution does it (and in less time than the one I already wrote which does it) I'm going to be angry :D.

Actually, on my way home I thought of another way of doing it which is similar to his, but it takes out a few of the more obvious inconsistencies before wasting stack space to recur. But it does look like his may give at least similar performance to the one I'll be writing...I'll have to try it out.

Mine, in pseudocode, is thought out like this (pity after a week, I finally get something better and a day right lol...such is life for wanting to do it myself and not go get help from a bigger group of experts right away.)

1) Pick number from set (starting at first)

2) Remove all numbers behind it (that are not in order)

3) Find successive ascending path

4) If we've exhausted our list (the entire remaining path is ascending) then we've found a solution, other wise, recur with remaining numbers (including the one that ended the ascending path found in 3).

5) Repeat 1 for the next number in the set.

Of course, add steps in to prune if the removal set gets bigger than the best already found solution and things like this...but this may just do it and in tons of time less than my other one.

THANKS EVERYONE, I'll try it out at work again on Monday.

P.S. That programming team stuff is killer ain't it? I remember my Sophomore year of college I took 41st in our region. Personally I think I kicked some butt seeing as how I had 2 other members on my team that didn't know a thing about what was going on, but they did well giving me moral support and reminding me not to slam the keyboard on the ground when I got angry :D.

DreamWarriora at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 13

Umm...wait I just realized the other solution in the other thread didn't take into account ascending numbers in the middle.... However my above attempt removing everything < the target point up till the next target ascention point will probably work....

Well, we'll see Monday. If anyone has any more ideas I'd love to hear them...however I'm almost convinced that this can not be done 100% iteratively - there will have to be some branch decisions made to solve this minimally.

DreamWarriora at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 14

Its not normally too hard to convert a recursive algorithm into an

iterative one. All you need is your own stack implementation. Just

push and pop the current state onto and off your stack. Your

code then becomes somthing like

for(int i=0;i<setP.length;i++)

{

// do some stuff to stack here

while(!algorithm());

}

algorithm has access to the stack and therefore its state. the while

loop just keeps calling it until its processed all states.

I have to admit that this technique works well for recursive algorithms

with a limited (fixed) number of branches (they can be hardcoded). Not

sure how well it will work when you effectivly have n branches.

As for bbritt's algorithm...it does appear to work but does not scale

well.

Are you interested in a recursive algorithm that can process 1000

elements in about 35 secs (on a 266Mhz PII)?

it takes about 50 millisecs to process 100 elements.

matfud

>

matfuda at 2007-7-14 21:31:36 > top of Java-index,Other Topics,Algorithms...
# 15
That should have been 10,000 in 35 seconds not 1,000matfud
matfuda at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 16

I'd do it iteratively, traversing P maintaining a pool of candidate trees. tokenpool

nil{}

2{ (2 +) }

3{ (2 3 +) }

; replace (2 +) as all solutions beginning (2 +) have

; a longer one (2 3 +)

5{ (2 3 +) (2 3 5 +) }

1{ (2 3 +) (2 3 5 +) }

; the number of tokens left in S is less

; than the length of longest candidate

6{ (2 3 +) (2 3 5 6 +) }

4{ (2 3 5 6 +) }

; adding all remaining tokens won't make

; the first one best

The pool references the leaves of the candidates, who reference their parents.

Candidates are replaced in the pool if the token is the candidate's token +1, otherwise a new candidate is added.import java.util.Random;

class CandidateSet {

final CandidateSet parent_;

final int token_;

final int length_;

CandidateSet (int token, CandidateSet parent) {

parent_ = parent;

token_ = token;

length_ = (parent==null) ? 1 : parent.length_+1;

}

int[] toArray () {

int[] array = new int[length_];

CandidateSet yazz = this;

for (int i=length_-1; i>=0; i--) {

array[i] = yazz.token_;

yazz=yazz.parent_;

}

return array;

}

}

class CandidateRef {

CandidateSet candidate_;

CandidateRef next_;

CandidateRef (CandidateSet candidate, CandidateRef next) {

candidate_ = candidate;

next_ = next;

}

}

public class WierdNumberRemover {

// quality is in permill

public static CandidateSet unwierdify (int[] setP, int quality) {

CandidateRef pool = null;

int bestLength = 0;

int sizeOfP = setP.length;

CandidateSet bestCandidate = null;

int lastToken = -1;

CandidateSet bestOfLastToken = null;

for (int i=0; i<sizeOfP; i++) {

int token = setP[i];

// go through pool, trying to append this token

// to the candidates

CandidateRef ref = pool;

CandidateRef prev = null;

CandidateSet bestOfThisToken = null;

int bestLengthThisToken = 0;

int tokensRemaining = sizeOfP - i;

while (ref!=null) {

CandidateSet candidate = ref.candidate_;

int candidateToken = candidate.token_;

int candidateLength = candidate.length_;

// prune the candidate:

// * if adding all the remaining tokens to the candidate won't

//make it as long as the best one.

// * if it ends with the previous token, but it not the best one

//for that token.

if ((candidateLength + quality * tokensRemaining >< bestLength) ||

(candidateLength + (quality * (1 + sizeOfP - candidateToken))/1000 < bestLength) ||

((candidateToken==lastToken) && (candidate!=bestOfLastToken))) {

if (prev==null) {

pool = ref.next_;

} else {

prev.next_ = ref.next_;

}

} else {

// if the token can be appended to the candidate

if (candidateToken<token) {

CandidateSet newCandidate = null;

// if there are no tokens between the candidate's

// token and this one, replace the candidate

if (candidateToken == token-1) {

newCandidate = new CandidateSet(token, candidate);

ref.candidate_ = newCandidate;

// the best ending with this token

if (candidate.length_>bestLengthThisToken) {

bestOfThisToken = candidate;

bestLengthThisToken = candidate.length_;

}

} else if ((candidateLength + (quality * tokensRemaining)/1000 > bestLength) &&

(candidateLength + (quality * (1 + sizeOfP - token))/1000 > bestLength)) {

newCandidate = new CandidateSet(token, candidate);

pool = new CandidateRef(newCandidate, pool);

}

if (newCandidate!=null) {

// maintain the length of the best candidate

if (newCandidate.length_>bestLength) {

bestCandidate = newCandidate;

bestLength = newCandidate.length_;

}

// the best ending with this token

if (newCandidate.length_>bestLengthThisToken) {

bestOfThisToken = newCandidate;

bestLengthThisToken = newCandidate.length_;

}

}

}

// only update previous if we haven't removed ref

prev = ref;

}

// hoppity, hoppity, hop

ref = ref.next_;

}

// if we haven't appended this token to anything in the pool,

// and starting from here may create a token chain longer

// than the best one, start one here

if ((bestOfThisToken==null) &&

((quality * tokensRemaining) > bestLength * 1000) &&

((quality * (1 + sizeOfP - token)) > bestLength * 1000)) {

bestOfThisToken = new CandidateSet(token, null);

pool = new CandidateRef(bestOfThisToken, pool);

}

// keep reference to best candidate ending with

// the last token processed for pruning during the next

// pass through the pool

bestOfLastToken = bestOfThisToken;

lastToken = token;

/*

// debug: dump the pool

System.out.println("The pool after: "+token);

ref = pool;

while (ref!=null) {

System.out.println(" "+toString(ref.candidate_.toArray()));

ref = ref.next_;

}

System.out.println();*/

}

return bestCandidate;

}

// test

public static void main (String[] args) {

int[] P = parseIntArray(args);

System.out.println("P = "+toString(P));

P = unwierdify(P, 1000).toArray();

System.out.println("P'= "+toString(P));

int big = 10000;

P = new int[big];

Random random = new Random();

for (int i=0; i<big; i++) {

P[i] = i+1;

}

for (int i=0; i><big; i++) {

int rand = random.nextInt(big);

int swap = P[rand];

P[rand] = P[i];

P[i] = swap;

}

for (int quality = 1000; quality>0; quality -= quality>250 ? 250 : quality > 10 ? 240 : 1) {

long time = - System.currentTimeMillis();

CandidateSet cs = unwierdify(P, quality);

time += System.currentTimeMillis();

System.out.println("unwierdifying "+big+" at "+(quality/10)+"."+(quality%10)+"% took "+time+"ms to find a sequence "+cs.length_+" long");

}

}

// supporting gubbins

static int[] parseIntArray (String[] args) {

if (args.length==0) {

return new int[] {2, 3, 5, 1, 6, 4};

} else {

int[] array = new int[args.length];

for (int i=args.length-1; i>=0; i--) {

array[i] = Integer.parseInt(args[i]);

}

return array;

}

}

static String toString (int[] a) {

StringBuffer buffer = new StringBuffer();

buffer.append('{');

for (int i=0; i<a.length; i++) {

buffer.append(' ');

buffer.append(a[i]);

}

buffer.append(" }");

return buffer.toString();

}

}

Can't make meaningful speed comparisons as I'm on apple hardware, but it's still O(N^2) at 100% quality. You might want to reduce the number of high quality test cases on a slower machine.

If you want faster but less good, reducing the quality has some effect ;).

Pete>

pm_kirkhama at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 17

I think we have a winner.

pm_kirkhams code took alsmost exactly the same time to run as mine for

a selection of n (approx 35 secs for n=10000).

I then reaslised that my code was a bit over enthusiastic in its

optimisations under certain conditions (didn't always produce the

longest run of numbers) without that optimisation it scales less well

Unless I can figure out a more correct optimisation there is

no competition. Even if I did I think that pm_kirkhams code is

approaching optimal for this problem (an its also iterative...hurrah)

so your choice would then be between two algorithms that scale the same

and perform similarly. I know which one I'd choose (go itertation go).

Anyway here is bbritt's code and my modifications to it.

import java.util.*;

import java.io.*;

public class ArrayTest

{

private long testSeed = 5;

private int testSize = 100;

// holder for the dataset being processed

private int[] setP = null;

// repalcement for hashMap

private ArrayEntry[] procced= null;

// arrays to put the results in (these need to be swapped when a

// new larger result set is put in newValues) humm acutally they

// don't

private int[] oldValues = null;

private int[] newValues = null;

//stores the longest run found (contains the index of the longest run found)

// array length of longest run is actually longest+1

private int longest = -1;

/**

* constructor

*/

public ArrayTest(int testSize, long testSeed)

{

if(testSize>1) this.testSize = testSize;

if(testSeed>=0) this.testSeed = testSeed;

this.oldValues = new int[testSize];

this.newValues = new int[testSize];

this.procced = new ArrayEntry[testSize];

}

/**

* return the test size of this instance

*/

public int getTestSize()

{

return this.testSize;

}

/**

* return the random number generator seed in use

*/

public long getTestSeed()

{

return this.testSeed;

}

/**

* set the data set that this will process

*/

public void setDataSet(int[] ds)

{

this.setP = ds;

}

/**

* @param args the command line arguments

*/

public static void main(String[] args)

{

boolean alg = false;

File file = null;

BufferedWriter bw = null;

// print usage message

if(args.length<3||args.length>4)

{

System.out.println("Usage:\n\tjava TestArray algorithm size seed [filename]\n"

+ "\twhere:\n\talgorithm is bbritt or hashed\n"

+ "\tsize is the test size\n\tseed is the random number seed\n"

+ "\tfilename is the optional filename in which the P set numbers can be dumped" );

System.exit(1);

}

// parse args

if(args[0].equals("hashed")) alg = true;

int size = Integer.parseInt(args[1]);

long seed = Long.parseLong(args[2]);

// open file if argument is given

if(args.length==4)

{

try{

file = new File(args[3]);

bw = new BufferedWriter(new FileWriter(file,false));

}catch(IOException ioe)

{

ioe.printStackTrace();

bw=null;

}

}

ArrayTest at = new ArrayTest(size,seed);

// init setP (random consecutive numbers)

int[] P = initSetP(at.getTestSize(),at.getTestSeed());

// write file if requested

if(bw!=null)

{

try{

for(int i=0;i<P.length;i++)

{

bw.write(P[i]);

bw.write("\n");

}

bw.close();

}catch(IOException ioe)

{

ioe.printStackTrace();

}

}

System.out.println("set P has "+at.getTestSize()+ " elements and seed "+at.getTestSeed());

// set the data set to use

at.setDataSet(P);

// perform hashed method

if(alg)

{

System.out.println("hashed method");

at.algorithm(true);

}else

{ // try modified bbritta method

System.out.println("bbritta method");

at.algorithm(false);

}

}

/**

* create an fill an Object array with Integers (consecutive starting at 0)

*/

public static Integer[] createFilledArray(int size)

{

Integer[] tmp = new Integer[size];

for(int i=0;i<size;i++)

tmp[i]=new Integer(i);

return tmp;

}

/**

* create an Integer [] and shuffle it randomly. Bit ugly

* but it works

*/

public static int[] initSetP(int size, long testSeed)

{

Integer[] tmp = createFilledArray(size);

ArrayList list = new ArrayList();

//put all elements in a list

for(int i=0;i<tmp.length;i++)

list.add(i,tmp[i]);

// shuffle the list

Collections.shuffle(list, new Random(testSeed));

// get an array from the list and return it

tmp = (Integer[])list.toArray(new Integer[1]);

int[] result = new int[tmp.length];

for(int i=0;i<result.length;i++)

result[i] = tmp[i].intValue();

return result;

}

/**

* print out an array of Integers

*

* @param array to print

* @param number of elements to print

*/

public static void printIntegerArray(int[] array, int arrayLength)

{

int size = 0;

// ensure argument is not too big

size=(arrayLength>array.length)?array.length:arrayLength;

for(int i=0;i<size;i++)

{

System.out.print(array[i]+" ");

}

System.out.println();

}

/**

* the algorithm. Find the longest set of increasing numbers from an

* array of unsorted numbers. If more then one set has the largest number

* of elements this algorithm produces the first. That is the one whose

* path root is leftmost in the unsorted array.

*

* @param use hashed algorithm if true, bbritt's if false.

*/

public int algorithm( boolean fast)

{

int maxSizeFound =0;

Integer[] tmp = null;

int size = 0;

// set longest

longest = -1;

long startTime = System.currentTimeMillis();

for(int startVal = 0; startVal><setP.length;startVal++)

{

if(startVal+longest+1>setP.length) break;

// choose recursive algorithm to run

if(!fast)

recurse(startVal,0);

else

fastRecurse(startVal,0,startVal);

}

// print results

System.out.println("Size of array is "+(longest+1));

System.out.println("Time (ms) = "+(System.currentTimeMillis()-startTime));

// print best result set

printIntegerArray(newValues, longest+1);

return 0;

}

/**

* business end of the algorihtm. plagerised from bbritta and modified

* so that results are written just before the method exits (rather then

* as soon as it is called). This makes creating results alot easier as

* it no longer overwrites answers as it goes.

*/

public boolean recurse( int myIndex, int pathLength)

{

boolean writeOnReturn = false;

// find if Im one of the longest paths

if(pathLength>longest)

{

longest=pathLength;

// tell callers to write their value to result array

writeOnReturn = true;

}

// iterate over remaining elements in set P

for(int i=myIndex+1;i<setP.length;i++)

{

boolean retValue = false;

if(setP[i]>setP[myIndex])

{

retValue = recurse(i,pathLength+1);

if(retValue) // if callee tells me to write value I will honour it

writeOnReturn = true;

}

}

if(writeOnReturn)

{

newValues[pathLength] = setP[myIndex];

}

// tell caller to write its value or not (true,false respectivly)

return writeOnReturn;

}

/**

* faster business end of the algorihtm. Exactly the same as the modified

* bbritt algorithm above. It however has a number of further optimisations

* that can significantly reduce the scope of the search. This leads to

* a large performance increase. (no idea what the complexity is though)

*

* @param myIndex the index this should work on

* @param pathLength the length of path from startIndex to here

* @param startIndex the index in unsorted array that this recursion was

* initially called to process.

*/

public boolean fastRecurse(int myIndex,int pathLength, int startIndex )

{

boolean writeOnReturn = false;

// simple condition check

// is it possible to find a longer set

if(longest-pathLength>=setP.length-myIndex) return false;

// if the current element does not exist in the array

if(procced[myIndex]==null)

{

// insert the current element and the head of the path to it into the array

procced[myIndex] = new ArrayEntry(startIndex, pathLength);

}else{

// if the current element from setP is in the HashMap see if the element that is the

// root of the path to it is less then the value of the current element if so

// I cannot make a longer path starting here

if( (procced[myIndex].length>=pathLength) )

{

return false;

}else

{

procced[myIndex].length=pathLength;

procced[myIndex].startIndex = startIndex;

}

}

/////////////////////////////////////////////////////////////////////////////////

// from here algorithm is the same as modified bbritt algorithm above

/////////////////////////////////////////////////////////////////////////////////

// find if Im one of the longest paths

if(pathLength>longest)

{

longest=pathLength;

// tell callers to write their value to result array

writeOnReturn = true;

}

// iterate over remaining elements in set P

for(int i=myIndex+1;i<setP.length;i++)

{

boolean retValue = false;

if(setP[i]>setP[myIndex])

{

retValue = fastRecurse(i,pathLength+1,startIndex);

if(retValue) // if callee tells me to write value I will honour it

writeOnReturn = true;

}

}

if(writeOnReturn)

{

newValues[pathLength] = setP[myIndex];

// procced[myIndex][LENGTH] = pathLength;

}

// tell caller to write its value or not (true,false respectivly)

return writeOnReturn;

}

}

class ArrayEntry

{

protected int startIndex;

protected int length;

protected ArrayEntry(int startIndex, int length)

{

this.startIndex = startIndex;

this.length = length;

};

}

Interesting problem. I wonder why they don't occur here more often.

matfud

matfuda at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 18
> I'd do it iteratively, traversing P maintaining a pool of candidate trees. I'm impressed!
bbrittaa at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 19
> > I'd do it iteratively, traversing P maintaining a> pool of candidate trees. > > I'm impressed!I do appologise. I appear to have dropped the a of your tag a few timesin the above postings.I'm impressed as well.matfud
matfuda at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 20
Absolutely no problem. It has been mangled much worse.
bbrittaa at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 21

> I think we have a winner.

>

> pm_kirkhams code took alsmost exactly the same time to

> run as mine for

> a selection of n (approx 35 secs for n=10000).

putting your randomizer into my code so they run off the same data set with seed 0, I get:

java WierdNumberArrayTest hashed 10000 0

set P has 10000 elements and seed 0

hashed method

Size of array is 198

Time (ms) = 81741

101 188 .. 9687 9765 9913

java WierdNumberRemover

unwierdifying 10000 at 0.8% took 151ms to find a sequence 151 long

P'= { ... }

unwierdifying 10000 at 1.8% took 1334ms to find a sequence 197 long

P'= { 101 188 ... 9687 9765 }

If I put 'britta' as the algorithm, it doesn't complete.

Are you sure you posted the right code?

Pete

pm_kirkhama at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 22

Nope thats the correct code. The faster version I was talking about

sometimes produced incorret results so I took out the optimistaion that

caused the problem (there is still a comment in the src about it). It

may be possible to find a similar optimisation that produces correct

results but I preffer your iterative method and so can't be bothered to

investigate further.

congrats on a nice piece of code

matfud

matfuda at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 23

Oh and the bbritta code won't work for n greater then about 300 (well

I couldn't be bothered to wait for it). Remember it recursively

checks all possible sequences of incrementing numbers and can therefore

not be expected to scale up to anywhere near n=10000.

The optimisations in my modification just reduce (prune) the size of the search

space and therefore increase the efficiency of the algorihtm.

However the main optimisation didn't work out (or rather I'm too tired

to figure it out how to fix it) so I posted the working but

(significantly) slower version.

matfud

matfuda at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 24

Wow, you all have been busy. I, however, can not seem to get this java source to run on my machine. However, from looking at it, I can not tell if it solves the original problem or not.

It looks like you are all focusing on finding the longest path, not finding the longest path given that you can remove numbers from the problem set to concatenate paths. I'm not sure if this is true or not, however, since I can't get the java source to compile and run (sorry, I'm a java newb).

Anyway, if someone wants to show me some output for various input, maybe I can better see what's going on. Here's a sample output line from my recursive attempt.

53: [ 1 28 3 4 5 6 7 8 9 10 2 13 12 14 15 16 17 26 50 20 21 22 23 24 25 18 51 11 29 30 31 32 33 34 35 27 37 44 39 40 41 42 43 38 45 46 47 48 49 19 36 52 53 ]

+ 14 14 0.020229 Sol: 13: [ 2 11 13 18 19 26 27 28 36 38 44 50 51 ]

The problem set is 53 members (found within the first set of braces).

The solution found using my recursive algorithm which took .02 seconds and it found that the best solution was to remove 2, 11, 13, 18... etc (found between the second set of braces).

If we, therefore, follow that solution on the input set, we find that we're getting the following remaining sequence:

1 3 4 5 6 7 8 9 10 12 14 15 16 17 20 21 22 23 24 25 29 30 31 32 33 34 35 37 39 40 41 42 43 45 46 47 48 49 52 53

While that may not be the only solution, the longest sequence length found by my algorithm after removals was 40 members long. Anything greater than that would be incorrect.

If you all want to try this one and see what you come up with, I'd appreciate it.

DreamWarriora at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 25

All three algorithms find the longest run of increasing numbers

(removing as may as you wish to find it).

The modified bbritta code and my faster version use recursion to

find the longest such run. Mine is alot faster then the nieve resursion

but not as fast as the iteratative algorithm posted here.

The iterative algorithm (pm_kirkham) maintains a list potential paths

through the array (each path is also a list (therefore it maintains a

linkedlist of linkedlists). Then for every element in the original

array it checks each of the paths (lists) to see if the new element is

able to make these paths longer. all this happens inside the while

loop (cunning stuff).

matfud

matfuda at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 26

how big are the datasets that you are handling. If they are <100 elements

there is not a huge difference in performance between any of the algorithms

if they are larger the pm_kirkham method is by far the fastest.

have you tried processing 10,000 elements with your algorihtm?

or even 1,000?

The code for all of the algorihtms does compile (although cut and

paste from these forums tends not to work very well). If you

want help just ask.

matfud

matfuda at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 27

Most of the time the elements will be < 100, however I have run larger sequences through mine and it really depends more on how many discrepencies exist than the element count (for mine).

I'll try and do a code conversion from the java to c and try the algorithms posted. I can not use java for this anyway, so if they work it'll need to be done regardless. Besides, I'll need to change them around a bit to spit out the removals rather than the largest set, as technically that is what I want as my answer. So rather than take the longest set information and overlay that on the input set to find out what was removed, I'd rather just have the algorithms store it as they go so that tossing it out in the end is speedy.

As for the java compiling, it does seem to compile just fine, however when I try to execute it there are tons of errors about unable to find certain other classes and all kinds of other garbage. I'm sure it is probably something I'm doing wrong. Here's what happened:

First, I got an error about the file name and realized I had to name it WierdNumberRemover.java. Then I compiled it with the Java 1.2 compiler that HP-UX provided (javac) and it compiles fine. Then, since I didn't know how else to execute it, I made an html file and ran the appletviewer application on it. I get errors from that like: java.io.FileNotFoundException: /tmp_mnt/.../class.class (no such file or directory) followed by lots of at java.io.* errors and sun.applet.AppletClassLoader.* and other stuff....

DreamWarriora at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 28

to run it

just change to the directory the resulting WeirdNumberRemover.class

is in and type

java -cp . WierdNumberRemover

The performance of all of the algorithms (apart from bbritta's)

is partially dependent on the data set. The iterative method runs in

O(n^2) worst case I believe but will often produce better performaces.

(not sure I have not exmined the code in that level of detail).

I'm not sure how easy it will be to change it so that it spits out the

numbers removed. That would take a fair bit of studying the code.

especially when one of the first two posts I made show O(n) methods of

doing this.

however since you'd have to rewrite all of the algorithms in C (not

too hard but a pain) and if your data sets are in general quite small

you might as well just use the one you have. (after all its written

and the performance differences for small data sets are minimal)

matfud

matfuda at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 29

Thanks for showing me how to run it! :D, that worked very well...now I can actually start toying with the code to see how it works and see if I can get it to track removals while working.

It looks as if his iterative method is basically iterative recursion anyway, as he's tracking a list of candidates which will be checked in the next iteration...which is similar to passing each one of those candidates to a recursive routine that would check them to their end as well. However, I agree that any iterative routine would be better than recursive...and I'll investigate his code more.

THANKS EVERYONE! Hope you had fun playing with this...I know I did.

Rewriting the java in C won't be too bad...I understand java, I'm just very rusty and have never done anything serious in it. I'd use what I have, I just don't like recursive routines if there is an iterative alternative. Since there is a working iterative alternative thanks to you all, I will have to investigate it further and see what can be done to make it work as I wish it to.

DreamWarriora at 2007-7-19 4:42:20 > top of Java-index,Other Topics,Algorithms...
# 30

> If you all want to try this one and see what you come

> up with, I'd appreciate it.

Adding this method to CandidateSet to find the inverse of the solution:

int[] toInverseArray (int count) {

int[] array = new int[count - length_];

CandidateSet yazz = this;

int pIndex;

int aIndex = array.length - 1;

for (pIndex=count-1; pIndex>=0; pIndex--) {

if (pIndex + 1 == yazz.token_) {

yazz=yazz.parent_;

} else {

array[aIndex] = pIndex + 1;

aIndex--;

}

}

return array;

}

And changing main to run 10^9/(n^2) tests at various qualities:

public static void main (String[] args) {

int[] P = parseIntArray(args);

System.out.println("P= "+toString(P));

int TESTS = (int)(1000000000 / (((long)P.length)*P.length));

if (TESTS<1) {

TESTS = 1;

}

System.out.println("Averaging times on "+TESTS+" tests");

for (int quality = 1000; quality>0; quality -= quality>250 ? 250 : quality > 50 ? 50 : 5) {

long time = - System.currentTimeMillis();

CandidateSet cs = null;

for (int test=0; test<TESTS; test++) {

cs = unwierdify(P, quality);

}

time += System.currentTimeMillis();

time = (time*1000)/TESTS;

System.out.println();

System.out.println("at "+(quality/10)+"."+(quality%10)+"% took "+time+" nano seconds to find a sequence "+cs.length_+" long");

System.out.println("P'= "+toString(cs.toArray()));

System.out.println("Sol = "+toString(cs.toInverseArray(P.length)));

}

}

Running that on a twin 1.42 GHx G4 Mac (deleting the less interesting results):

java WierdNumberRemover 1 28 ... 52 53

P= { 1 28 ... 52 53 }

Averaging times on 355998 tests

at 100.0% took 37 nano seconds to find a sequence 40 long

P'= { 1 3 4 5 6 7 8 9 10 12 14 15 16 17 20 21 22 23 24 25 29 30 31 32 33 34 35 37 39 40 41 42 43 45 46 47 48 49 52 53 }

Sol = { 2 11 13 18 19 26 27 28 36 38 44 50 51 }

at 75.0% took 32 nano seconds to find a sequence 40 long

P'= { 1 3 4 5 6 7 8 9 10 12 14 15 16 17 20 21 22 23 24 25 29 30 31 32 33 34 35 37 39 40 41 42 43 45 46 47 48 49 52 53 }

Sol = { 2 11 13 18 19 26 27 28 36 38 44 50 51 }

at 50.0% took 26 nano seconds to find a sequence 40 long

P'= { 1 3 4 5 6 7 8 9 10 12 14 15 16 17 20 21 22 23 24 25 29 30 31 32 33 34 35 37 39 40 41 42 43 45 46 47 48 49 52 53 }

Sol = { 2 11 13 18 19 26 27 28 36 38 44 50 51 }

at 25.0% took 16 nano seconds to find a sequence 38 long

P'= { 1 3 4 5 6 7 8 9 10 12 14 15 16 17 20 21 22 23 24 25 29 30 31 32 33 34 35 37 39 40 41 42 43 45 46 47 48 49 }

Sol = { 2 11 13 18 19 26 27 28 36 38 44 50 51 52 53 }

...

at 10.0% took 8 nano seconds to find a sequence 34 long

P'= { 1 3 4 5 6 7 8 9 10 12 14 15 16 17 20 21 22 23 24 25 29 30 31 32 33 34 35 37 44 45 46 47 48 49 }

Sol = { 2 11 13 18 19 26 27 28 36 38 39 40 41 42 43 50 51 52 53 }

...

at 4.0% took 6 nano seconds to find a sequence 23 long

P'= { 1 3 4 5 6 7 8 9 10 13 14 15 16 17 26 29 30 31 32 33 34 35 36 }

Sol = { 2 11 12 18 19 20 21 22 23 24 25 27 28 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 }

...

at 2.0% took 4 nano seconds to find a sequence 10 long

P'= { 1 28 29 30 31 32 33 34 35 36 }

Sol = { 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 }

...

at 0.5% took 3 nano seconds to find a sequence 10 long

P'= { 1 28 29 30 31 32 33 34 35 36 }

Sol = { 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 }

You don't generally get much difference in the output for q above 25%; what you get below 1% depends on the dataset: if it's entriely increasing, you get the same whatever q is, if it's large and random, q<1% is still quite good.

The time units are not a typo.

Pete">

pm_kirkhama at 2007-7-19 4:42:24 > top of Java-index,Other Topics,Algorithms...
# 31
AAARGH!The're not a typo for milli but they are a typo for micro.I was profiling some C++ to find fibbonacci (100,000) earlier.That's my excuse anyway.Pete
pm_kirkhama at 2007-7-19 4:42:24 > top of Java-index,Other Topics,Algorithms...
# 32

pm_kirkham

Thanks for the inverse function, however I wanted to refrain from doing that sort of thing. Given that I have to port everything to C anyway, I'm just going to take the code and switch it to hold on to the last number added to the sequence and the sequence length, then rather than store a linked list of the numbers in the set, I'll hold on to a list of numbers out side of the set. This will be better for many reasons, but most of which is that the datasets that I'll be dealing with will more often have more numbers in order than out of order. Therefore keeping the removals will take up less memory space than holding on to the run sequence and less CPU time then having to call a function to obtain the removals based on that information which I don't need anyway.

Thanks for the algorithm though, I was very close to the same thing, I just left out the part where you add a duplicate set if the appended number is not tight against the last sequence number. It makes so much sense now why I wasn't always finding the best sequence with my non-recursive solutions of the past...and it was because of that little ommission.

DreamWarriora at 2007-7-19 4:42:24 > top of Java-index,Other Topics,Algorithms...
# 33

Remember the pool of sequences represented by:

{ (3 4 + ) (3 4 6 +) }

looks like this is memory:

[6 . [4 . [3 . null] ] ]

^^

[+ . [+ . null ] ]

Where '.' indicates the reference to the next in the list.

The static memory overhead is at worst 2N-1 objects (up to N temporary lists may be created then culled on each iteration - these should be pooled in the real implementation).

To attach a list of selections to each node in pool requires up to one operation per list element per node in the pool. For the above pool, if the next tokens are (2 5 1) then '2' has to be added to the list of rejections for both (3 4 6 +) and (3 4 +), but '5' only to the list for (3 4 6 +), then '1' to both, giving the rejection lists of (2 5 1) and (2 1). As neither the head or tails of the sequences generated are in general subsequences of each other, you can't use the same trick of referencing into the same list as above, so have to maintain up to ~(N^2) lists.

Constructing the removals list after the algorithm has finished costs O(N) time and the alogrithm uses ~3N objects.

Maintaining the removals lists during the algorithm costs O(N^2) time and ~(2N + (N-1)^2) objects.

Pete

pm_kirkhama at 2007-7-19 4:42:24 > top of Java-index,Other Topics,Algorithms...
# 34

pm_kirkham:

I'll have to investigate a bit further...but I think my final implementation works out best for the subsets of data that it will be generally handling.

It also seems like you are inferring that it's space utilitization will be greater because I'll have to keep track of both the sequence list and the removal list, which is not at all the case. In fact, I only need to know the last number "appended" to the list to use this algorithm. Therefore, I never bother constructing the "best sequence" list, only the removal list.

Since my data is going to generally going to be in order, keeping track of and maintaining a linked list or pool of sequences is less efficient then a simple store to the last element "appended." Now for the removal sequences, they'll typically, due to my data being most in order, be shorter than the in order sequences. While they may need to be updated a bit more often for some cases, their updating seems very little more costly (should I decide to use the pooling trick via inserting them in order) and more space efficient.

If, however, the removals would tend to be more often than the ordered sequences, I'd agree that storing them would be useless. However, since in my case this is not true (on average), I think I made a better choice by storing removals and replacing the sequences with a single element giving me the last "appended" number.

DreamWarriora at 2007-7-19 4:42:24 > top of Java-index,Other Topics,Algorithms...
# 35

here is a bit of code that might be better for relatively short sequences consisting mainly of consecutive integers

it will run out of memory and be very slow for other inputs

asjf

import java.util.*;

class Min {

static Map range2lS = new HashMap();

static int [] input = new int [] { 1 ,28 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,2 ,13 ,12 ,14 ,15 ,16 ,17 ,26 ,50 ,20 ,21 ,22 ,23 ,24 ,25 ,18 ,51 ,11 ,29 ,30 ,31 ,32 ,33 ,34 ,35 ,27 ,37 ,44 ,39 ,40 ,41 ,42 ,43 ,38 ,45 ,46 ,47 ,48 ,49 ,19 ,36 ,52 ,53 };

static class Range {

int b,t;

Set children;

boolean hasParent;

Range(int x,int y){b=x;t=y;children=new HashSet();hasParent=false;}

public String toString() {return "r["+input[b]+".."+input[t]+"]";}

}

static List constructRangeDAG(int [] p) { // fails for p.length < 3

List result = new LinkedList();

for(int b=0,t=0; t<p.length; b=t) {

do {

t++;

} while(t><p.length && p[t-1]==p[t]-1);

Range r = new Range(b,t-1);

for(int i=0; i><result.size(); i++) {

Range pred = (Range) result.get(i);

if(p[pred.t]><p[b]) {

pred.children.add(r);

r.hasParent = true;

}

}

result.add(r);

}

return result;

}

public static void main(String [] arg) {

long start = -System.currentTimeMillis();

List ll = getMaxAscSeq(constructRangeDAG(input));

System.out.println("\n"+ll+" "+ll.size());

System.out.println(start+System.currentTimeMillis());

}

static List asList(int [] a) {

List result = new LinkedList();

for(int i=0; i><a.length; i++)

result.add(new Integer(a[i]));

return result;

}

static List longestSequence(Range r) {

if(!range2lS.containsKey(r)) {

List longest = new ArrayList();

for(Iterator i = r.children.iterator(); i.hasNext(); ) {

Range child = (Range) i.next();

List l = longestSequence(child);

if(l.size()>longest.size()) longest = l;

}

longest = new LinkedList(longest);

for(int i=input[r.t]; i>=input[r.b]; i--)

longest.add(0, new Integer(i));

range2lS.put(r,longest);

}

return (List) range2lS.get(r);

}

static List getMaxAscSeq(List rangeGraph) {

List result = new ArrayList();

for(Iterator i = rangeGraph.iterator(); i.hasNext(); ) {

Range root = (Range) i.next();

if(!root.hasParent) {

List l = longestSequence(root);

if(l.size()>result.size()) result = l;

}

}

return result;

}

}

asjfa at 2007-7-19 4:42:24 > top of Java-index,Other Topics,Algorithms...