Hard

Hi everybody!

I'm developing an application where there is a need to sort queries depending on how general they are.

Basically, consider a big N*M matrix, where every entry (A_j,k ) can contain an int, 0 or 1. (Implemented as boolean, but ints are easier to read here)

Now i want to extract every ROW where certain criterias are met, for example that "column zero==1 && column one==0, others irrelevant".

For this, i've got an algorithm that takes a query and returns the indicies of the rows that matches these criterias.

The query is a 1xM array with M constraints, one for each column.

The constraints are:

publicstaticfinalint MUST_BE_ZERO =0

publicstaticfinalint MUST_BE_ONE = 1

publicstaticfinalint IRRELEVANT = 2

Consider the following matrix:

0100

1001

1010

0111

1100

The queryint[] query =newint[]{MUST_BE_ONE, MUST_BE_ZERO, IRRELEVANT, IRRELEVANT}

would match rows 1 & 2 here (first row: row #0).

So, now to the real problem (phew!):

I need to sort these queries so that the most loose queries are top priority. I will need a list that starts with the queries that are most loose, and end with the ones that are the most narrow.

As a quick hack, I created all possible queries and then sorted them depending on the numbers of IRRELEVANT in each query.

However, as the matrix grows this becomes highly innefficient, and I need an algoritm that creates this list. Below is an example of a correctly sorted list where N=2:

22 (most loose query, matches all columns)

20

21

02

12

00--|

01 | -- These four are most "strict"

10 |

11--|

I got pretty close to solving this by creating a tree with 0:s, 1:s and 2:s, and printing it in preorder, but that was a coincidence and only worked for N<=4 (if i remember correctly).

Thank you for you time, any help is greatly appreciated. Please let me know if I should clarify anything.

Regards

Alex

[2635 byte] By [ArneWeisea] at [2007-10-1 17:59:44]
# 1
I think I would just create a Comparator that compared on the number of irrelevants and store the queries in a TreeSet.
RadcliffePikea at 2007-7-11 12:35:39 > top of Java-index,Other Topics,Algorithms...
# 2

Yes, that would undoubtedly be much more efficient than using Collections.sort(). But since i'm making the list myself, it would feel better to generate it correctly from the beginning, rather than generate it in the wrong order and sorting.

Just so I'm getting it right, is this the way you were thinking?

* Generate all possible combinations in an arbitary order, and add these to a priority queue as we go along.

What about HashSet? This has complexity O(1), but TreeSet would be O(N log N), wouldn't it?

But isn't there any kind of intiutive way to generate this list direcly?

Alex

ArneWeisea at 2007-7-11 12:35:39 > top of Java-index,Other Topics,Algorithms...
# 3

> Yes, that would undoubtedly be much more efficient

> than using Collections.sort(). But since i'm making

> the list myself, it would feel better to generate it

> correctly from the beginning, rather than generate it

> in the wrong order and sorting.

Oh I see, I misunderstood your problem. Using a TreeSet wouldn't give any advantage if you need to generate all queries and sort them. I'm sure there is a way to map generate your patterns in order, I'll give it some thought if I get some time.

> What about HashSet? This has complexity O(1), but

> TreeSet would be O(N log N), wouldn't it?

Just a quick note:

HashSet is O(1) for each insertion, it would be O(n) overall, and it wouldn't be sorted in any particular order. TreeSet would be O(log n) for each insertion giving an overall complexity of O(n log n).

RadcliffePikea at 2007-7-11 12:35:39 > top of Java-index,Other Topics,Algorithms...
# 4

> Oh I see, I misunderstood your problem. Using a

> TreeSet wouldn't give any advantage if you need to

> generate all queries and sort them. I'm sure

> there is a way to map generate your patterns in

> order, I'll give it some thought if I get some time.

>

That would be great! Thanks.

To clarify a bit: This will be used as a part of an algorithm that minimizes truth tables in digital electronics. Every possible combination of constraints should be in the list, and the list should be "sorted" by the number of two:s in it.

The list will then be used to find the minimal boolean expression that realizes a desirable output.

> Just a quick note:

> HashSet is O(1) for each insertion, it would be O(n)

> overall, and it wouldn't be sorted in any particular

> order. TreeSet would be O(log n) for each insertion

> giving an overall complexity of O(n log n).

...And I misunderstood you ;) Thanks for clairifying.

Regards

ArneWeisea at 2007-7-11 12:35:39 > top of Java-index,Other Topics,Algorithms...
# 5

Your queries are essentially numbers in base 3. You showed all the queries with 2 elements in sort order and there were just what you would expect, 3^2 or 9 elements. The number of possible queries for N columns is 3^N. This is clearly exponential in N. You will NOT be able to generate a list and fit it into your gig or so of memory unless you have something less than 20 columns.

Why exactly is it useful to create a list of all possible queries in sorted order rather than dealing with the easier problem of sorting a much shorter list of actual queries?

In general, when you find yourself wanting to generate all of some combinatorial set, you are attempting to generate an exponentially large list and are doomed from the outset.

What is it that you actually want to do?

marlin314a at 2007-7-11 12:35:39 > top of Java-index,Other Topics,Algorithms...
# 6

Even though I suspect that you don't really need a list of all queries in order, I found the problem interesting so I hacked out some code to generate them in order.

The code actually decodes an integer, n, into the nth item on the sorted list. So if you have 5 cols in your queries, your sorted list would have 3^5 = 243 elements on it and you can get the 119th element directly by calling arrange(5, 119)

If you can convince me that there really is a reason for this, I could document the code, otherwise I will just leave it in the undocumented spew that it was.

static int p(int a, int b){

if(a == 0 || a == b) return 1;

return p(a,b-1) + p(a-1,b-1);

}

static String out(int a, int b, int n, int foo){

String res = "";

if(a == 0){for(int i = 0; i<b; i++){res+="2";} return res;}

if(a == b){

for(int i = 0; i><b; i++){

res+=foo%2; foo/=2;

}

return res;}

int t = p(a,b-1);

if(n ><= t) return "2" + out(a,b-1,n, foo);

String d = ""+foo%2; foo/=2;

return d + out(a-1,b-1,n-t, foo);

}

static int p3(int n){

int t = n;

int res = 1; while(t-->0){res *=3;}

return res;

}

static String arrange(int b, int n){

int max = p3(b);

if(n <= max){

int a = 0;

int p2 = 1;

int t = 1;

while(n>t){

n -=t;

a++;

p2 *= 2;

t = p2*p(a,b);

}

return out(a,b,((n-1)/p2)+1, (n-1)%p2);

} else {

return "past limit";

}

}

static void showAll(int n){

int lim = p3(n);

for(int i = 1; i<=lim; i++){

System.out.println(arrange(n,i));

}

}

public static void main(String args[]){

int colCount = 3;

showAll(colCount);

System.out.println("");

System.out.println(arrange(17, 120989));

}

output is

222

220

221

202

212

022

122

200

210

201

211

020

120

021

121

002

102

012

112

000

100

010

110

001

101

011

111

22122022022112222

Process terminated with exit code 0

marlin314a at 2007-7-11 12:35:39 > top of Java-index,Other Topics,Algorithms...
# 7

If I'm not mistaken, it doesn't matter in which position the 'IRRELEVANT'

tag is present, i.e. MUST_BE_ONE, IRRELEVANT is as 'strict' as is

IRRELEVANT, MUST_BE_ZERO. If this is correct, the only thing you

have to do is build a Rank class:public class Rank implements Comparable {

int nof_irrelevants;

int nof_zeros_or_ones;

// ctor stuff here ...

//

// compare two Ranks

public int compareTo(Object obj) {

Rank that= (Rank)obj;

if (this.nof_irrelevants > that.nof_irrelevants) return 1;

if (this.nof_irrelevants < that.nof_irrelevants) return -1;

if (this.nof_zeros_or_ones) > that.nof_zeros_or_ones) return 1;

if (this.nof_zeros_or_ones) < that.nof_zeros_or_ones) return -1;

return 0;

}

}

The higher the rank, the less 'strict' the query.

kind regards,

Jos

JosAHa at 2007-7-11 12:35:39 > top of Java-index,Other Topics,Algorithms...
# 8

Thanks for your replies marlin!

> Your queries are essentially numbers in base 3. You

> showed all the queries with 2 elements in sort order

> and there were just what you would expect, 3^2 or 9

> elements. The number of possible queries for N

> columns is 3^N. This is clearly exponential in N. You

> will NOT be able to generate a list and fit it into

> your gig or so of memory unless you have something

> less than 20 columns.

This is a problem. However, based on the fact that minimization of truth tables is an NP problem (from what i've heard, any other opinion?), I guess i'll have to live with that.

>

> Why exactly is it useful to create a list of all

> possible queries in sorted order rather than dealing

> with the easier problem of sorting a much shorter

> list of actual queries?

This problem is completely equal to karnaugh maps minimization.

In order to do minimization, I'll have to try ALL possible combinations. Hence, worst case, i need 3^N operations.

However, since there is a good chance of solving the problem before all combinations are tried, it would be a performance gain to avoid storing the list in memory, and just executing the queries one by one.

>

> In general, when you find yourself wanting to

> generate all of some combinatorial set, you are

> attempting to generate an exponentially large list

> and are doomed from the outset.

>

> What is it that you actually want to do?

Right now I would like to go for a swim. ;)

Regards

ArneWeisea at 2007-7-11 12:35:39 > top of Java-index,Other Topics,Algorithms...
# 9
The code you provided was very nice. If you could please just explain the basic principle to me I'd be happy. I've been thinking of this for so long now, got to know the answer ;)Alex
ArneWeisea at 2007-7-11 12:35:40 > top of Java-index,Other Topics,Algorithms...
# 10

Hi JosAH

> If I'm not mistaken, it doesn't matter in which

> position the 'IRRELEVANT'

> tag is present, i.e. MUST_BE_ONE, IRRELEVANT is as

> 'strict' as is

> IRRELEVANT, MUST_BE_ZERO. If this is correct, the

> only thing you

> have to do is build a Rank class

This is correct, however see the previous posts regarding this. I need to generate the list in the correct order, not really sort it. Sorting is just a workaround.

Regards

Alex

ArneWeisea at 2007-7-11 12:35:40 > top of Java-index,Other Topics,Algorithms...
# 11
Hi,Why don't sum up things? Build sums over each queries using IRRELEVANT = 0, MUSTBEONE=+1, MUSTBEZERO=+1Then simply sort by those sums.Peter
Soylenta at 2007-7-11 12:35:40 > top of Java-index,Other Topics,Algorithms...
# 12

> > If I'm not mistaken, it doesn't matter in which position the 'IRRELEVANT'

> > tag is present, i.e. MUST_BE_ONE, IRRELEVANT is as 'strict' as is

> > IRRELEVANT, MUST_BE_ZERO. If this is correct, the only thing you

> > have to do is build a Rank class

>

> This is correct, however see the previous posts regarding this. I need to generate the list in the

> correct order, not really sort it. Sorting is just a workaround.

That's what I stated: you don't have to generate a list in any correct order.

All you want is a least and most 'strict' query and anything in between.

My previous reply showed how you could easily derive a PTO (Partial

Total Ordering) over your queries.

kind regards,

Jos

JosAHa at 2007-7-11 12:35:40 > top of Java-index,Other Topics,Algorithms...
# 13

> That's what I stated: you don't have to generate a

> list in any correct order.

> All you want is a least and most 'strict' query and

> anything in between.

> My previous reply showed how you could easily derive

> a PTO (Partial

> Total Ordering) over your queries.

Sorry, but I'm not really into math definitions of orderings. I belive I got the picture however. I can verify that there is not one unique correct ordering of the queries. As I stated, only the 'strictness' is important, and therefore such a thing as doing a sort by rank works like a charm.

But, what I don't understand how this would speed thing up. As I see it, it would require creating the complete list and then sort it by rank, which is exactly what I am doing at the moment.

Marlin314:s solution would give access to any index at any time, without building the list first, wich saves a LOT of memory.

Please correct me if I'm wrong, quite inexperienced in algorithms.

Regards

Alex

ArneWeisea at 2007-7-11 12:35:40 > top of Java-index,Other Topics,Algorithms...
# 14
BTW: What about the time complexity of your code Marlin? Think it'll work well with small tables, or should I switch between two methods?RegardsAlex
ArneWeisea at 2007-7-11 12:35:40 > top of Java-index,Other Topics,Algorithms...
# 15

my code runs at "blinding electronic speed" :)

actually the implementation sucks primarily due to my lazy use of a recursive routine to compute the binomial coefficients, that would be the p(a,b) routine.

The method of operation is fairly simple. Suppose you are dealing with N queries. First you will have 1 querry that is all 2's then you will have a block that is all 2s except for one position, ... until you have the section with no 2s at all.

supposed you are in the block that has 15 2's out of 20 possible slots. How big will that block be? Well, there are p(15,20) different ways to arrange the 2's within the block of 20 slots, and there are 5 remaining slots that will hold the ones and zeros.

Thus for any single fixed arrangement of the 15 2s there will be 2^5 possible ways of filling in the ones and zeros. The total size of the block (containing 15 twos) will be the product of the way to arrange the twos times the number of ways to fill in the ones and zeros for any single arrangement, i.e. p(15,20)*2^5

in more generality, the block that has A 2s spread between B slots has p(A,B)*2^(B-A) elements.

Once you know that, given any n, (N is the number of querie variables, n is the index of a particular query in the sorted list) you can tell what block it belongs to by reducing n by the size of each block. i.e. you subtract the size of the first block from n and see if it went negative, if not you subtract the size of the next block and so on. When it goes negative, you know it is in that block. You can back up by one and now you have reduced n to a new number that tells you how far along you are in that particular block.

Now you do the same basic trick to decode this new n into a particular arrangement of 2s and the remaining arrangement of ones and zeros.

The one single block which had p(A,B)*2^(B-A) elements can be arranged in the order where the first 2^(B-A) are the first arrangement of 2s, the second 2^(B-A) are the second arrangement of 2s etc. I called this power of 2, p2 in the code.

Thus n/p2 is a single number encoding which particular arrangement of 2's you will be using and n%p2 is the particular arrangement of ones and zeros that you need to scatter between the twos.

Arranging the 2's in order is the same basic process. Suppose you are placing 15 2s in 20 slots. There are two basic ways to do this. You can have a single 2 in the first slot and then you distribute (15-1) 2s in the remaining (20-1) slots, OR you do NOT put a 2 in the first slot and you must distribute all 15 across the remaining (20-1) slots.

You decode the n, representing the arrangement of 2s the same way. You know that the first block (the one with a one or zero in the front) has exactly p(A,B-1) elements so if n is less than that, you drop in either a one or a zero - otherwise you drop in a 2, reduce n and recurse to drop the remaining elements into the remaining slots.

Unfortunately, as you can see from the undocumented code, it is much easier to write the code than it is to describe how it works.

This whole decoding process is not terribly slow, except for the calculation of p(a,b) which in its recursive form is exponential in runtime. It could be made much faster, essentially constant for the small numbers that you are dealing with. Since b can not get bigger than about 20 without overflowing the int, a 20by20 array of cached p(a,b) values will make it constant. or you can go the the definition of the binomial coefficient in terms of factorials if you are careful to not overflow your ints. In any case, with work one can vastly improve the speed of p(a,b).

Other than that, the decoding of an n is on the same order of magnitude as what it takes to decode an integer into its ascii representation. However, it is of course limited by the size of the integer. If N gets up to about 20, 3^20 is about 3 billion and that is too big to fit in a lowly 32 bit integer so you will not be able to decode those n values.

I leave it as an exercise to the reader to rewrite p(a,b) and to convert all the ints to longs. I still contend that doing something that essentially requires 3^N steps is a doomed business.

Hope that helps.

marlin314a at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 16

>I still

> contend that doing something that essentially

> requires 3^N steps is a doomed business.

I agree with you on that. Just cannot come up with any other way of solving the problem... Anyway, will think about it some more... It's not that important however, N is rarely bigger than 5.

>

> Hope that helps.

Actually, that was more than helpful! Thanks for making the effort of writing that text. And by the way, smart thinking!

I, "the reader", will improve a bit on the implementation.

Since you stressed that this is a doomed business, do you have any other suggestions?

Regards

Alex

ArneWeisea at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 17

> >I still

> > contend that doing something that essentially

> > requires 3^N steps is a doomed business.

>

> I agree with you on that. Just cannot come up with

> any other way of solving the problem...

I disagree on that. Have a look:public static final int IRRELEVANT= 0;

public static final int MUST_BE_ZERO= 1;

public static final int MUST_BE_ONE= 1;

private int[] qry1= { MUST_BE_ONE, IRRELEVANT, MUST_BE_ZERO };

private int[] qry2= { IRRELEVANT, MUST_BE_ONE, IRRELEVANT };

public int strength(int[] query) {

int s= 0;

for (int j= 0; j < query.length; j++)

s+= query[j];

return s;

}

See? No matrix, no nothing, just a simple 'strength()' method doing its job.

kind regards,

Jos

JosAHa at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 18

OK - so let me get this straight.

You actually have a truth table in something like 5 variables (or sometimes more) and what you really want to do is produce a "minimal" boolean expression in disjunctive normal form that encodes that truth table.

The list of queries you wanted generated are not really queries (like in a database sense) but rather are the terms in that disjunctive form. You plan to run this list looking for terms that are consistant with the truth table and when you find one, you reduce the truth table by that form (meaning eliminate the 1s in the table that are explained by the term) and continue until you have your normal form which you will then claim is minimal.

Is that correct? I mean that what you are trying to do is to quickly go from a truth table to a minimal disjunctive normal form (or possibly some other normal form)?

marlin314a at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 19

Yup, completely correct!

I could not find any algorithms that supported minimization, so i tried to reverse engineer the Karnaugh map algorithm. However, I was a little short on time and fell for the easiest algorithm to implement. I also came up with that the karnaugh map algo has a complexity of (3^N)/N, and brute force 3^N, so i was a little bit lazy....

I guess there must be some way of decreasing the time complexity, but since there always is a need of pairing the prime implicants, I'm not sure there really is a way of making it more effective. Maybe you could reach a good result more quickly, if you could skip the demand of getting a perfect result, and finding a method of evaluating possible solutions?

Do you have any ideas?

Regards

ArneWeisea at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 20

I will give it some thought, but before I do, I'd like to be sure that disjunctive normal form really is what you want. i.e. disjunctive normal form is easy to understand and looks all neat and mathematical, but it may not actually be optimal from a circuit perspective. (I am just guessing here)

For example. consider the three variable boolean expression

P and (Q or R)

this has two operators but is not disjunctive normal. the equivalent DN form would be PQ or PR which might require 3 gates rather than 2.

I say "might" because I really know absolutly nothing about electronic circuits, I only guess about them from having heard about them all my life and from having been on the other side of the wall writing software. Your comment about Karnaugh maps threw me, because for some reason they don't teach that in grad level mathematical logic courses so I had to go look it up.

So, before I work on an circuit optimization algorithm, I want to know what is really optimal.

And so as not to be particularly coy, I'll tell you a similar problem that I had to solve some time ago that may be somewhat relevant. Back in the stone ages of monochromatic displays I was designing the Raster Ops for the BitBlt function for Windows. The purpose of the raster op was to take three bits of input, one from a pattern bitmap, one from a source bitmap and one from the distination bitmap and combine them to produce a single bit that would go to the destination bitmap.

Since there are 3 possible inputs, there are 2^3 possible input patterns. A single function is defined by the pattern of 1s and 0s that you assign to each of those 8 input patterns. Thus are exactly 2^8 or 256 possible binary functions that have 3 bits of input. We figured that rather than choose a small selection of these as the only allowed rasterops, we would just allow them all (potentially a litte more work for the software implementations but actually fairly easy for hardware implementations)

The definition of the raster op fits nicely into a single byte, but in a software implementation of the raster op, you need to actually come up with a boolean function made up of operations commonly found on processors like NOT, AND, OR, XOR, EQUALS, PUSH, POP each of which has some number of clock ticks.

So I wrote a routine to crank through the 255 possible functions and generate the optimal code snippet (optimal by clock ticks on the ancient 8088 processor that were in use at the time) for each one. When I was done I had a little reverse polish string of code that would evaluate the desired raster op.

Finally, since we were packing the raster op into a 32bit word, we used 8 of the bits to encode the definition of the function that made sense for the hardware guys and used the remaining 24 bits to encode the little RPN string for the software implementations. So on a sortware implemenation, bitblt, grabs the raster op, runs a little mini interpreter that cracks the RPN into a little chunk of machine code, which it embeds in the middle of the blt routine and for any blt of anything more than 20 bits, the cost of running the little interpreter first is gained back by haveing optimal code deep in the inner loop of your blt.

If you have ever happened to look at the rasterop definitions in Windows and wondered what those wierd long numbers are, they are those encodeing of optimal rpn execution strings.

You are essentially doing the same sort of thing, trying to find an optimal encoding of a boolean function, the only difference being 1) that I precomputed my encodings for the single set of three input boolean functions and you appear to want to do things on the fly. and 2) you may have a different notion of what constitutes an optimal circuit.

So if you could give me a bit of an education on what actually constitutes an optimal circuit, I might be able to generate them a little more directly and efficiently than the querry based method that you were thinking of. Or perhaps, my description of the rasterop problem has given you a notion of how to proceed differently.

Enjoy!

marlin314a at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 21

reduced expression is now

ac + aB + aC + ab

legal substitutions are now

ac + aC = a

aB + ab = a

expression is now

a + a

duplications can always be removed

we reduce to

a

We are almost there. In order to do these reductions properly, you need to combine terms but you don't want the combinatorial explosion caused by trying to combine every term with every other term. Consider that when you are doing a reduction like abc + abC, there is one and only one variable that is different between the two expressions that can be combined. We would like to bring all these together in one pass.

If we wrote a proper hash routine that would perfectly hash a single term, we could force that hash routine to cause collisions ONLY for the terms that properly combine. Then in a single pass through all the terms, we move each term to its proper hash slot and ONLY when two terms properly combine do they end up in the same slot.

This is actually easy to accomplish. If you encoded the terms with two bits for each variable, set to 01 for a positive variable present in the term, 10 for a negative variable present in the term, and 00 for a variable that is absent from the term, you can pack a single term (of up to 16 variables) into an integer, and if you just mask the one variable that you think might be eliminated, then two terms that are otherwise identical will have the same bit pattern. This is your hash function.

so in psuedo code -

create an empty GrowingList

for each variable v{

clear the hash table

for each term t {

hash t by v and look in the table

if the table slot is already occupied by a term {

mark the term in the table as no longer needed

mark the term t that just collided with it as no longer needed

add the reduced term (which is essentially the hash code) to the GrowingList

} otherwise the hash slot was empty so just add this term t to the hash table

}

}

// finally combine the results

for each term t in the original expression{

if t has been marked as no longer needed toss it otherwise add to GrowingList

}

Total runtime I am guessing is bounded by N*2^N where N is the number of boolean variable that you are dealing with (2^N) is the possible number of terms in the original list and my guts tell me that you can at most do at most elimination passes since on each one you eliminate one of N variables. So if you have 10 variables, you are looking at some thousand slots in your truth table, potentially a thousand terms in your original expression, you'll need about a million slots in your hash table and you will need about 10 Million operations to reduce it. Not even enough time to go for coffee. But is it really optimal? Who knows, you tell me.

All in all a much more comprehensible solution that groveling about with Karnaugh maps and strings in base 3.

BTW - do let me know if it happens to work. I won't be terribly surprised, but I would be pleased. Lastly, I would be even more pleased to hear, "this is totally bogus - Knuth solved this problem much more directly years ago and it is in monograph form waiting for edition 4 of 'the Art'."

Enjoy!

marlin314a at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 22

Sorry for the previous post - the initial portion got cut off. Here is what was intended.

I submit for your consideration the following algorithm

if we name the variables in your boolean system abcdefg... and for typing convenience let A stand for not a and B stand for not b etc. you can describe any single one in your truth table by a string that a) contains every letter exactly once, and b) it is either in upper case or in lower but not both. Thus a single term in a "raw" disjunctive normal expression will look something like this abCdE and the entire expression could look something like this:

abC + AbC + ABc + aBc

You can create this raw disjunctive normal expression rather directly from your truth table, one term for each bit that is one.

Now we wish to simply the above expression. The one primary simplification looks like this: Given two terms that are exactly alike except for a single variable (for example in the above expression the first two terms abC + AbC have bC in common with the difference in the variable a) you can combine them into a single term that keeps only the common portion. In boolean algebra this is just observing that

abC + AbC = (a + A)bC = (1)bC = bC

If we look at the first expression I wrote the legal combinations are

abC + AbC = bC

ABc + aBc = Bc

by replacing equal parts the entire expression can be simplified to

bC + Bc

Clearly any combination of two terms will simplify the expression, and the more combination that happen, the merrier.

However the example I gave does not deal with the whole problem. There may be overlapping expressions. For example consider

abc + abC + AbC

the legal substitutions are

abc + abC = ab

abC + AbC = bC

which one do you make? The answer is: make them both! It is fairly easy to see that

ab + AbC

abc + bC

ab + bC

are all equal boolean expressions.

(ab + AbC) = (a + AC)b = (a + C)b = ab + bC

The first substitution eliminated terms 1 & 2 the second eliminated 2 & 3. It does not hurt and in fact it helps to "eliminate the middle term" twice.

Now I claim, without proof, that if you make all these binary simplifications all simultaneously using this one single replacement rule, and then you do it again on the resulting expression, you will in fact get a minimal disjunctive normal form expression.

Another example:

abc + aBc + abC + aBC

legal substitutions are

abc + aBc = ac

aBc + aBC = aB

abC + aBC = aC

abC + abc = ab

reduced expression is now

ac + aB + aC + ab

legal substitutions are now

ac + aC = a

aB + ab = a

expression is now

a + a

duplications can always be removed

we reduce to

a

We are almost there. In order to do these reductions properly, you need to combine terms but you don't want the combinatorial explosion caused by trying to combine every term with every other term. Consider that when you are doing a reduction like abc + abC, there is one and only one variable that is different between the two expressions that can be combined. We would like to bring all these together in one pass.

If we wrote a proper hash routine that would perfectly hash a single term, we could force that hash routine to cause collisions ONLY for the terms that properly combine. Then in a single pass through all the terms, we move each term to its proper hash slot and ONLY when two terms properly combine do they end up in the same slot.

This is actually easy to accomplish. If you encoded the terms with two bits for each variable, set to 01 for a positive variable present in the term, 10 for a negative variable present in the term, and 00 for a variable that is absent from the term, you can pack a single term (of up to 16 variables) into an integer, and if you just mask the one variable that you think might be eliminated, then two terms that are otherwise identical will have the same bit pattern. This is your hash function.

so in psuedo code -

create an empty GrowingList

for each variable v{

clear the hash table

for each term t {

hash t by v and look in the table

if the table slot is already occupied by a term {

mark the term in the table as no longer needed

mark the term t that just collided with it as no longer needed

add the reduced term (which is essentially the hash code) to the GrowingList

} otherwise the hash slot was empty so just add this term t to the hash table

}

}

// finally combine the results

for each term t in the original expression{

if t has been marked as no longer needed toss it otherwise add to GrowingList

}

Total runtime I am guessing is bounded by N*2^N where N is the number of boolean variable that you are dealing with (2^N) is the possible number of terms in the original list and my guts tell me that you can at most do at most elimination passes since on each one you eliminate one of N variables. So if you have 10 variables, you are looking at some thousand slots in your truth table, potentially a thousand terms in your original expression, you'll need about a million slots in your hash table and you will need about 10 Million operations to reduce it. Not even enough time to go for coffee. But is it really optimal? Who knows, you tell me.

All in all a much more comprehensible solution that groveling about with Karnaugh maps and strings in base 3.

BTW - do let me know if it happens to work. I won't be terribly surprised, but I would be pleased. Lastly, I would be even more pleased to hear, "this is totally bogus - Knuth solved this problem much more directly years ago and it is in monograph form waiting for edition 4 of 'the Art'."

Enjoy!

marlin314a at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 23

One should always be suspicious of a post that has the words, "claim without proof"

I coded up my algorithm and sho 'nuff the experssions generated while reduced were not actually minimal. My single reduction rule is not sufficient. There was another problem. I would occasionally end up with epressions like

ac + bC + ab

My single reduction does not reduce that expression but it can be reduced because the ab term is not necessary. ab adds only a subset of the bits already added by the expression (ac + bC) as can be seen by the fact that

ab(ac + bC) = (abc + abC) = ab(c + C) = ab(1) = ab

So I need an additional reduction that eliminates terms that are redundant due to the presence of earlier terms.

The additional reduction is that if two terms share exactly one variable that occurs with different sign in the two expression, then you can eliminate the variable and union what's left. For example:

(aBc + aCD) combines to eliminate the cC and gives aBD

and what the reduction means is that if aBc and aCD are already terms in an expression, then you don't need to include the term aBD in the expression as well.

This adds an N^2 pass (where N is the number of terms in an already reduced expression) to the algorithm, where you look at all pairs of terms and eliminate those that are shadowed by two previous terms.

Ah well - I implemented that reduction, it simplifies the expressions even more. Are they minimal, I believe so, but am no longer willing to claim them to be so without proof.

So it goes.

marlin314a at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 24

Hi Marlin!

Been away for a while, glad to see soo many post! I've been searching a bit more for some solutions. Sorry to say, i've found several, although they do not seem to be public domain...

The karnaugh map equiv algo is called QM, see http://staff.science.uva.nl/~jesshope/web-site-08123/Downloads/minimising-boolean-eqns.pdf. (See proofs there also, quite similar to your thinking)

There is also a couple of algos from berkley university, called espresso. I havent found that much info on how to implement it, just been reading some documents briefly. This is not optimal, but has a complexity of N*logN!! Seem interesting!

I think your idea is good, but I have not yet understood how you deal with overlappings that are not complete (when two terms do not collide completely, maybe I need to read everything once again...).

Will get back next week, will take a holiday now.

Thanks a million for you time, will try to be back here as soon as possible!

Regards

Alex

ArneWeisea at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...
# 25
Hi, I'm doing a project for college on using java code to implement a karnaugh map, but I've only just started java programming...has any one found code on this so i can try an understand what I'm doing?...Thanking you,Navi
Navia at 2007-7-20 9:55:45 > top of Java-index,Other Topics,Algorithms...