array oriented question / ideas sought...

I am investigating ways to perform the following (currently array based) problem efficiently.

Here is the problem:

1. There are two same-sized arrays (some other data structure may work as well) which store 0s and 1s at each index location.

E.g.,int finalArray[ ] =newint[ 2601 ];

int changeArray[ ] =newint[ 2601 ];

finalArray[ 201 ] = 1;

changeArray[ 201 ] = 0;

2. The problem is, given these two arrays, how many times does EITHER index location (e.g., out of the 2601) contain a 1 value,

E.g.,if( finalArray[ 201 ] == 1 && changeArray[ 201 ] == 0)

indexContainingAtLeastOne1++;

and how many times do BOTH index locations contain a 1

E.g.,if( finalArray[ 201 ] == 1 && changeArray[ 201 ] == 1)

indexContainingBoth1s++;

This process is currently done iteratively, pseudocode:

for each index location out of 2601 (in this case)

increment indexContainingAtLeastOne if either array stores a 1 at this index location

increment indexContainingBoth1s if both arrays store a 1 at this index location

when done, print totals

Question: Is there a faster way? As these are essentially bit values, is there a quick way to do it with bits? To not even use arrays? I am currently investigating these possibilities and others, but would appreciate any help. Any answer/suggestion that leads to a faster solution will be awarded Duke Dollars.

[1731 byte] By [larryraisanen] at [2007-9-27 21:00:19]
# 1

Well, the basics of the algorithm will not allow you (at least i think it) to do it in a better perfon鐁mance of O(n), so the only better ways to improve it are chaging your comparation operations.

First of all you can try the XOR ( ^ ) and AND ( & ) that can be used with integers.

the XOR operation is like:

0 ^ 0 = 0

0 ^ 1 = 1

1 ^ 0 = 1

1 ^ 1 = 0

and the And is:

0 ^ 0 = 0

0 ^ 1 = 0

1 ^ 0 = 0

1 ^ 1 = 0

i think that using that you can rewirte your code as:

for (int i = 0 ; i < 2601 ; i++) {

if (finalArray[i] ^ changeArray[i] == 1) {

indexContainingAtLeastOne1++;

}

if (finalArray[i] & changeArray[i] == 1) {

indexContainingBoth1s++;

}

}

As you can see the operations in the if statement are less (in my code there are two ops ((^ and ==) in the first if and (& and ==) in the second one) while in your code you have 3 (two == and one && for example).

I do not know exactly how it would improve the loop, but you should try it to see if it is better.

a better way to do it would be:

for (int i = 0 ; i < 2601 ; i++) {

indexContainingAtLeastOne1 += finalArray[i] ^ changeArray[i];

indexContainingBoth1s += finalArray[i] & changeArray[i];

}

In this example you avoid asking if the result is 1, because you add the result (if it is 1 it will add 1, and if the result is 0 it will add nothing. I think this should speed up your loop.

If you really need to get more speed you can try to store your bits into the integer. One integer can store 32 bits, and the & and ^operators make the AND and XOR of every bit. Then you can count the bits that are 1 in the result of the operation and add them to the total.

I will make a pseudo code approach:

int[] matrix = new int[2^32];

public void initializeMatrix() {

for (int i = firstIntegerThatIsPossible ;

i < lastIntegerThatIsPossible ; i++) {

matrix[i] = numberOf1InTheInteger_i;

}

}

and you loop is:

for (int i = 0 ; i < 2061 / 32 ; i++) {

indexContainingAtLeastOne1 += matrix[finalArray[i] ^ changeArray[i]];

indexContainingBoth1s += matrix[finalArray[i] & changeArray[i]];

}

The first method should be used only once in your application (when it starts, and this values should be used in your app. The loop should go around 32 times faster. The problem is that you do not have space for 2^32 integers in memory. Maybe you should try to use another kind of type like byte. It would only be 8 times faster but it only requires 2^8 integers. Or maybe a 16 bit type.

Well, I hope the response is clear enough (it will not work inmediately but i hope you catch the point). If you do not and you are interested in more details (maybe you have not understood anything) you can write to me (zerjio79@hotmail.com) and i will try to explain it in more detail.

Zerjillo

Zerjio at 2007-7-7 2:41:08 > top of Java-index,Other Topics,Algorithms...
# 2

To Zerjio:

The operators ( ^ and & ) did make a noticeable improvement in the program... although I actually needed ( | and & ), because I needed to count instances of ( 1-0, 0-1, and 1-1 ) for the indexContainingAtLeastOne1.

While this code should be sufficient, I am always after faster code, as the number of times this section of code needs to both iterate and be called during the course of the program is very large indeed.

I'm not sure I fully understood the other suggestion you made... and would appreciate any clarification. Thanks for your help already, and if there are no other replies, I will reward you the rest of the Duke Dollars in a few days. Thanks again.

larryraisanen at 2007-7-7 2:41:08 > top of Java-index,Other Topics,Algorithms...
# 3

ok, sorry for the XOR / OR mistake.

I will try to explain to you the last idea with and example. Supposse you have a type which consists of two bits. Lets call it VeryShortInt.

My idea is to make a matrix which contains the number of 1 in every possible number:

matrix[0] = 0;0 = 00

matrix[1] = 1;1 = 01

matrix[2] = 1;2 = 10

matrix[3] = 2;3 = 11

Then suppose that | and & make the OR and AND bit a bit in your VeryShortInt, for example:

01 | 10 = 11-> 1 | 2 = 3

00 | 11 = 11-> 0 | 3 = 3

01 | 00 = 01-> 1 | 0 = 1

00 | 10 = 10-> 0 | 2 = 2

or with &:

01 & 10 = 00-> 1 | 2 = 0

00 & 11 = 00-> 0 | 3 = 0

01 & 00 = 00-> 1 | 0 = 0

00 & 10 = 00-> 0 | 2 = 0

so if your bits are stored in pairs using the VeryShortInt type you can make 2600 / 2 iterations to solve your problem.

You don't have to increment the sum with 1, but with the number of 1 in the solution of the | or & operation:

for (int i = 0 ; i < 2061 / 32 ; i++) {

indexContainingAtLeastOne1 += matrix[finalArray[i]|changeArray[i]];

indexContainingBoth1s += matrix[finalArray[i] & changeArray[i]];

}

Well, in Java there is no 2 bit type, but "byte" has 8 (so your matrix should look like:

matrix[0] = 000000000

matrix[1] = 100000001

matrix[2] = 100000010

matrix[3] = 200000011

matrix[4] = 100000100

...

matrix[254] = 711111110

matrix[255] = 811111111

or maybe an integer, which has 32 bits. The problem with the 32 bits is that you do not have enough space for a matrix with 2^32 entries. But you can try some other solutions in which you operate with integers but only need a table of 255 entries (the byte one), using the >> and << operators. This is more complicated, so i would initially try the byte approax.

Hope this clarifies a little bit the previous post. If doesn't, let me know, and I will try to make some code for you.

Zerjillo

Zerjio at 2007-7-7 2:41:08 > top of Java-index,Other Topics,Algorithms...
# 4

Zerjio... the code looks like this currently:

for( int i = 0; i < totalRTP; i++ )

{

// increments cover total if either value (or both) is 1 ( uses bit or )

totalRTPCovered += whichMastsCoveredFinal[ i ] | whichMastsCoverWhichRTParrayDynamic[ i ];

// increments overlap total if both values are 1 (uses bit and )

totalOverlap += whichMastsCoveredFinal[ i ] & whichMastsCoverWhichRTParrayDynamic[ i ];

}

The totalRTP can get as large as 900,000... will the second approach still work? And more importantly, do you think it will be faster? And this current code only takes 3 lines, which seems efficient in and of itself, and easy to understand. I do appreciate your effort, but do you think it would be feasible to pursue the second approach?

larryraisanen at 2007-7-7 2:41:08 > top of Java-index,Other Topics,Algorithms...
# 5

Here's my code for a boolean array class, that I wrote when I wanted to store 8 bits per byte instead of 1. (I needed some *big* arrays of booleans)

public class BoolArray implements java.io.Serializable {

byte[] b;

int size;

int type=0;

public static final int FALSE=0;

public static final int TRUE=1;

public static final int ERROR=2;

public int getType() {

return type;

}

public void setType(int t) {

type = t;

}

public BoolArray(int n) {

b = new byte[n/8+1];

size = n;

}

public boolean get(int i) {

try {

return ((b[i/8] & (1 << (i%8))) != 0);

} catch (ArrayIndexOutOfBoundsException e) {

if (type == ERROR) {

throw new ArrayIndexOutOfBoundsException(i);

} else if (type == TRUE) {

return true;

} else {

return false;

}

}

}

public int getSize() {

return size;

}

public void set(int i) {

try {

b[i/8] |= (1 << (i%8));

} catch (ArrayIndexOutOfBoundsException e) {

if (type == ERROR) {

throw new ArrayIndexOutOfBoundsException(i);

}

}

}

public void unset(int i) {

try {

b[i/8] &= ((byte)(0xff) - (1 << (i%8)));

} catch (ArrayIndexOutOfBoundsException e) {

if (type == ERROR) {

throw new ArrayIndexOutOfBoundsException(i);

}

}

}

public void setFalse(int i) {

unset(i);

}

public void setTrue(int i) {

set(i);

}

public void setAllTrue() {

for (int i=0; i<b.length; i++) {

b[i] = (byte)(0xff);

}

}

public void setAllFalse() {

for (int i=0; i><b.length; i++) {

b[i] = (byte)(0x00);

}

}

public void setAll() {

setAllTrue();

}

public void unsetAll() {

setAllFalse();

}

}

Then, your code would become:

BoolArray final = new BoolArray(2601);

BoolArray change = new BoolArray(2601);

final.unsetAll();

change.unsetAll();

final.set(201);

To solve your problem, I'd add a couple of methods:

public BoolArray(BoolArray bb) {

// creates a clone

}

public void or(BoolArray bb) {

// to do: check for errors if bb and this have different sizes

for (int i=0; i><b.length; i++) {

this.b[i] |= bb.b[i];

}

}

public void and(BoolArray bb) {

// to do: check for errors if bb and this have different sizes

for (int i=0; i><b.length; i++) {

this.b[i] &= bb.b[i];

}

}

public int bitCount() {

int sum=0;

// todo: should I bother to check for if b.size is not a multiple of 8? naah...

for (int i=0; i><b.length; i++) {

switch (b[i]) { // ugly but fast code:

case 0: break;

case 0x01: case 0x02: case 0x04: case 0x08: case 0x10: case 0x20: case 0x40: case 0x80: sum++; break;

case 0x03: case 0x05: case 0x09: ... case 0xc0: sum+=2; break;

...

case 0xfe: case 0xfd: ... case 0xef: sum+=7; break;

case 0xff: sum+=8; break;

}

}

return sum;

}

Then, you could solve your problem like so:

BoolArray workspace = new BoolArray(change);

workspace.or(final);

System.out.println(workspace.bitCount());

workspace = new BoolArray(change);

workspace.and(final);

System.out.println(workspace.bitCount());

Ok, this is not an efficient way to get *both* answers, but it's faster than any method that stores only 1 bit per array entry.

Hope that helps. Focus here is speed, not beauty, as requested.

Yours, Mike H...

>

Mike40033 at 2007-7-7 2:41:09 > top of Java-index,Other Topics,Algorithms...
# 6
Oops - it seems that in my java code above, all my references to entry i of array b have become b<i>....!For b<i> read b (open square bracket)i(close square bracket)Thanks!
Mike40033 at 2007-7-7 2:41:09 > top of Java-index,Other Topics,Algorithms...
# 7

To Mike H.:

It is going to take a while for me to digest and implement this code, but the ideas in and of themselves are worth a few Duke Dollars. Thanks for your help and for taking the time and effort to respond. When I get things working, I will post back to report any improvements in speed.

Regards.

larryraisanen at 2007-7-7 2:41:09 > top of Java-index,Other Topics,Algorithms...
# 8

Ok, here is the implementation i make for my ideas.

Test it and tell me if it is not faster. You can try to modify it to use "short" instead of "byte" and it will be faster, but your "matrix" will be 65536 integers long (not a very big deal i think).

Hope this helps.

Zerjillo

public class Example {

private static int[] matrix;

private static int totalRPT;

private static byte[] whichMastsCoveredFinal;

private static byte[] whichMastsCoverWhichRTParrayDynamic;

private static int arraysLength;

public static void initializeMatrix() {

matrix = new int[256];

for (int i = 0 ; i < 256 ; i++) {

int mask = 0x0001;

for (int b = 0 ; b < 8 ; b++) {

if ((i & mask) != 0) {

matrix[i]++;

}

mask <<= 1;

}

//System.out.println(matrix[i]);

}

}

public static void initializeArrays() {

if (totalRPT % 8 != 0) {

arraysLength = totalRPT / 8 + 1;

} else {

arraysLength = totalRPT / 8;

}

whichMastsCoveredFinal = new byte[arraysLength]; // Every byte has 8 of YOUR bits

whichMastsCoverWhichRTParrayDynamic = new byte[arraysLength];

}

public static void fillArraysWithData() {

// Here you should put your bits 8 in every byte of the array

// If the total amount of bit in your array is not divisible by 8

// You should fill the last byte with your last bits and leave

// 0 the rest of them. I will initialize them randomly

for (int i = 0 ; i < arraysLength ; i++) {

whichMastsCoveredFinal[i] = (byte)((int)(Math.random() * 255) - 128);

//System.out.println(whichMastsCoveredFinal[i]);

whichMastsCoverWhichRTParrayDynamic[i] = (byte)((int)(Math.random() * 255) - 128);

}

}

public static void compare() {// Still 3 lines of loop!!

int totalRTPCovered = 0;

int totalOverlap = 0;

for( int i = 0; i < arraysLength; i++ ) { // The loop is 8!! times shorter

// increments cover total if either value (or both) is 1 ( uses bit or )

totalRTPCovered += matrix[(whichMastsCoveredFinal[ i ] | whichMastsCoverWhichRTParrayDynamic[ i ]) + 128];

// increments overlap total if both values are 1 (uses bit and )

totalOverlap += matrix[(whichMastsCoveredFinal[ i ] & whichMastsCoverWhichRTParrayDynamic[ i ]) + 128];

}

System.out.println("TotalRTPCovered: " + totalRTPCovered);

System.out.println("TotalOverlad: " + totalOverlap);

}

public static void main(String[] args) {

initializeMatrix();// The 4 first lines sohuld be called only

// Once in your application.

totalRPT = 900000;//

initializeArrays();//

fillArraysWithData(); //

for (int i = 0 ; i < 1000 ; i++) {

compare(); // This can be called as many times as you want

}

}

}

Zerjio at 2007-7-7 2:41:09 > top of Java-index,Other Topics,Algorithms...
# 9
To Zerjillo:I will definitely try! I have a few important matters to take care of before I can implement and test. Your code looks excellent, and I am excited to give it a try! I will post performance improvemements once I can get things up and tested.
larryraisanen at 2007-7-7 2:41:10 > top of Java-index,Other Topics,Algorithms...
# 10

You should also try java.util.BitSet.

Your code would be :

BitSet finalSet = new BitSet(2601);

BitSet changeSet = new BitSet(2601);

//modifications

BitSet tmp = finalSet.clone().or(changeSet);

indexContainingAtLeastOne1 = tmp.cardinality();

tmp = finalSet.clone().and(changeSet);

indexContainingBoth1s = tmp.cardinality();

Internally an array of longs is used to stock the Bits.

However, cardinality is a 1.4 feature and the cloning may make it slow.

But you can always look at the code to get the number of bits in a long used by SUN in this class.

phohmeyer at 2007-7-7 2:41:10 > top of Java-index,Other Topics,Algorithms...
# 11

To phohmeyer: I had tried a regular set, and it was very slow. I will give the bit set a try as the code should be the same.

To Zerjillo: I think I understand most of the code, but I am having a bit (no pun intended) of trouble with the "FillArraysWithData" method. Again, I have not used bits regularly, so am very green at this type of coding.

Here is my trouble: Let's assume that totalRTP is 2601. This means that the arraySize would end up being 326. This provides enough room to store either a 1 or a 0 for each of my 2601 rtps... plus an additional 7 spaces that need to be filled with 0s.

Now, I need to be able to fill a specific rtp with a 1 ( and have the rest of default of 0 ). For instance, in the 0 array, I may only need to turn on number 4.

0 1 2 3 4 5 6 7 (rtps)

0 0 0 0 0 1 0 0 0

In the next I may need to turn on a few differnt,

8 9 10 11 12 13 14 15 (rtps)

1 0 0 11110 0

16 17 18 19 20 21 22 23 (rtps)

2 00111000

...

2601 2602 2603 2604 2605 2606 2607 2608 (rtps)

32610000000

The question is: The code you have just puts in a byte from 0-127 into the array. How am I to store / translate ( or enter for that matter) these 0s and 1s into a similar format as the one you used? I simply don't see how I am able to place a 1 at rtp 11, and have this translated into a number. Any help appreciated.

larryraisanen at 2007-7-7 2:41:10 > top of Java-index,Other Topics,Algorithms...
# 12

Well... I was tired last night, and rushed, so the previous question is a bit muddled. In short, when creating the whichMastsCoverWhichRTParrayDynamic[ ], I currently only have to do the following:

whichMastsCoverWhichRTParrayDynamic[ whichPoint ] = 1;

For example, if whichPoint = 222,

whichMastsCoverWhichRTParrayDyamic[ 222 ] = 1;

It is that simple. With the code above, how would I translate THAT into one of the byte arrays? It would be 222 / 8 = 27, so I would go to whichMastsCoverWhichRTParrayDynamic[ 27 ], and "move over" from the right "bit" 7 places and put a 1: 01000000 . But how is that done? And then used by the program?

larryraisanen at 2007-7-7 2:41:10 > top of Java-index,Other Topics,Algorithms...
# 13

there are several ways of putting a concrete bit to 1.

I will show you some code that can clarify you how to use bits:

byte b = 1;// This is 00000001

byte b2 = 88; // This is 01011000

byte b3;

b3 = (byte)(b | b2); // This would be 01011001 (00000001 | 01011000)

b3 = (byte)((b << 1) | b2); //01011010 (00000010 | 01011000)

b3 = (byte)((b << 3) | b2); //01011000 (00001000 | 01011000)

b3 = (byte)((b << 7) | b2); //11011000 (10000000 | 01011000)

The operator "<< n" shifts the bits in one byte or int n places to the left (imagine what ">> n" does).

So if you want a bit set to 1 in your byte in pos 4:

7 6 5 4 3 2 1 0

0 0 1 0 1 1 0 1-> This are your actual bits

|-> You want here one additional 1

you should only do:

yourByte = (byte)((1 << 4) | yourByte);

The casting to byte is because Java makes an automatic casting to int when you use | or & or ^.

Hope this helps.

Zerjillo

Zerjio at 2007-7-7 2:41:10 > top of Java-index,Other Topics,Algorithms...
# 14
Thank you. Yes, that makes it clear. Sorry to use so much of your time.
larryraisanen at 2007-7-7 2:41:10 > top of Java-index,Other Topics,Algorithms...
# 15

In the end, the second approach did not work. I needed to be able to add together two dynamicArrays to form a finalArray. With the first approach, this was a simple matter of:

for( i = 0; i < totalRTP; i++ )

{whichMastsCoveredFinal[ i ] = whichMastsCoveredFinal[ i ] |

whichMastsCoverWhichRTParrayDynamic[ i ];}

With information stored "within" the bytes, this does not work with the second approach, and when I tried to make it work, it needed to do as many iterations as in the first approach and be more complicated to boot.

If there was only ONE ever final array and ONE ever dynamic array, the second approach would be faster, but with the need for the "addition" of arrays, I don't think it can succeed.

Thanks everyone for their ideas. They were still worth investigation.

larryraisanena at 2007-7-18 15:58:31 > top of Java-index,Other Topics,Algorithms...
# 16

Hi,

Hey, guys! Does anyone still use Java API to do something? :)) For bit operations on set of bits I see the only way: to use java.util.BitSet. 1st of all it is OOP way to do bit operations. 2nd, it is optimized for memory (for your information, array of 10 integers could take around 60 bytes (this depends on JVM you use)...) So, here is approximative code:

BitSet

set1,// you 1st set of bits...

set2;// another set

// this will solve your 1st problem

int count1 = set1.xor(set2).cardinality();

// .. and this - your 2nd problem...

int count2 = set1.and(set2).cardinality();

That's all, enjoy the Java, man, best of OOP languages! :)

cebanencoa at 2007-7-18 15:58:31 > top of Java-index,Other Topics,Algorithms...
# 17

if addition is only the | operation i don't know why does not work that approach. What is the difference between using bits or bytes in this case? the | op is the same. Of course if you need some different kind of addition are you doing... just try to do it with bytes...

Don't worry about the time i have spended answering you, it was a great pleasure.

Zerjillo

Zerjioa at 2007-7-18 15:58:31 > top of Java-index,Other Topics,Algorithms...
# 18

> The totalRTP can get as large as 900,000... will the

> second approach still work? And more importantly, do

> you think it will be faster? And this current code

> only takes 3 lines, which seems efficient in and of

> itself, and easy to understand. I do appreciate your

> effort, but do you think it would be feasible to

> pursue the second approach?

Do you want it to be "fast" or "fast in java". If the former and given some constraints on how the data changes then I would suppose that using a fixed buffer and JNI implementation is going to be the fastest.

Even presuming JIT compiling I would guess that just accessing the data in the array would be costly in java compared to C.

jschella at 2007-7-18 15:58:31 > top of Java-index,Other Topics,Algorithms...
# 19
to Zerjio:Sorry it took me so long, but I got it to work with your code, and it REALLY moves! I would estimate a 50% increase in speedat the very least!Thanks so much.
larryraisanena at 2007-7-18 15:58:31 > top of Java-index,Other Topics,Algorithms...
# 20
:-) :-) :-) :-) :-)Zerjillo
Zerjioa at 2007-7-18 15:58:31 > top of Java-index,Other Topics,Algorithms...