Fast dual-integer hash algorithm

Hello all - Thanks for reading my question.

Here's my problem: I am looking for a faster hashing algorithm to generate a unique hash based on two Integers (TestNumber and SubTestNumber). The current (funcitonal) algorithm I'm using is to take the hashcode of the String: "[TestNumber] [SubTestNumber]". This works, but I would like it to be faster. Does anyone have any suggestions? Here's the code I use:

StringBuffer sb = new StringBuffer(16);

sb.append(getTestNumber());

sb.append(" ");

sb.append(getSubTestNumber());

return sb.toString().hashCode();

Thanks for your help in advance!

[638 byte] By [zmaranelloa] at [2007-9-29 22:28:41]
# 1
> generate a unique hash based on two> Integers (TestNumber and SubTestNumber). int hashCode = TestNumber ^ SubTestNumber;
tschodta at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 2
Thanks for the reply, however, it is possible that SubTestNumber is zero, which eliminates the possibility of exponents. Any other suggestions?
zmaranelloa at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 3
Bear with me, It's been a long day. I could add one and then that would take care of the 'zero case'. I'll give that a try. Thanks!
zmaranelloa at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 4
Actually, this won't work for us. Here's the problem:TestNumber: 4SubTestNumber: 1Hash = 4TestNumber: 2SubTestNumber: 2Hash = 4This doesn't produce a unique key, but thanks for your help. Any other suggestions?
zmaranelloa at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 5
> This doesn't produce a unique key, but thanks for your> help. Any other suggestions?You can't produce a unique int from two ints. Think about it... how many pairs of ints are there? Fortunately for you, hashCode() doesn't require uniqueness.
DrClapa at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 6
int int1 = 123;int int2 = 345;hashcode:int hash = (int) ((int1 << 32) + int2);or:int hash = (int1 + "xxx" + int2).hashCode();But, I don't know how reliable it is...
lexzeusa at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 7
>You can't produce a unique int from two ints. Think about it... how many pairs of ints are there? Fortunately for you, hashCode() doesn't require uniqueness. Yes, he is right ... there is equals(), and HashSet, HashMap, and HashTable uses this along with hashCode().
lexzeusa at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 8

The bitwise XOR operator looks a lot like a mathematical exponent operator, but it is not the same!!!

TestNumber: 4

SubTestNumber: 1

Hash = 4 Wrong!!! The right answer is 5!

TestNumber: 2

SubTestNumber: 2

Hash = 4Wrong! The right answer is 0!

The problem with a simple xor (^) is if the two numbers are equal (or related):

7^7 = 0

3^3 = 0

and:

10 ^ 2 = 8

11 ^ 3 = 8

12 ^ 4 = 8

Ragnvald_id2a at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 9

> int hash = (int) ((int1 << 32) + int2);

isn't always int1<<32 == 0 ?

>

> int hash = (int1 + "xxx" + int2).hashCode();

It is very reliable but is just a bit slow.

The documentation for String's hashCode() implementation is:

Returns a hash code for this string. The hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

So what I did was that I just copied this algorithm to int[] (which is the general case of 2 ints) instead of char[]

public int hashCode(){

//same algorithm as for String

int hash = 0;

for (int i = 0; i < data.length; i++) {

hash = 31*hash + data[i];

}

return hash;

}

Gil

gilroittoa at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 10

> generate a unique hash based on two

> Integers (TestNumber and SubTestNumber).int hashCode = TestNumber ^ SubTestNumber;

or if the two numbers are likely to be "close"int hashCode = ((TestNumber<<16)|(TestNumber>>>16)) ^ SubTestNumber;

tschodta at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 11

>isn't always int1<<32 == 0 ?

sorry, I forgot to typecast it to long ... (long) int1 << 32

>The documentation for String's hashCode() implementation is:

yes, I know that.

>So what I did was that I just copied this algorithm to int[] (which is the general case of 2 ints) instead of char[]

You're right.. But "123xxx456".hashCode() should generate more unique hashcode rather int[] because it takes more computations/iterations... perhaps..

lexzeusa at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 12

> You're right.. But "123xxx456".hashCode() should

> generate more unique hashcode rather int[] because it

> takes more computations/iterations... perhaps..

I truly think that the described hashCode() should be unique enough. And to create a string is very slow. More computations or iterations are by no means any garantue of more unique hashCode.

Gil

gilroittoa at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 13
The question is what do you expect to gain by using a combination of two integer values?
rkippena at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 14

The only way you can get a unique hash from two ints is if you know the given ints will be in the range of short.

Then you can just shift one to 16 bits and add them.

Otherwise, you cannot get a unique hashcode. If this is for something other than implementing hashcode as defined in Object, you could generate a unique long from any two ints.

dubwaia at 2007-7-16 2:50:07 > top of Java-index,Other Topics,Algorithms...
# 15

> The only way you can get a unique hash from two ints

> is if you know the given ints will be in the range of

> short.

>

> Then you can just shift one to 16 bits and add them.

>

> Otherwise, you cannot get a unique hashcode. If this

> is for something other than implementing hashcode as

> defined in Object, you could generate a unique long

> from any two ints.

re-implement the same thing for

long getLongHash() ?

rewrite the classes to use long as a hash?

Mordana at 2007-7-19 18:38:04 > top of Java-index,Other Topics,Algorithms...
# 16
> re-implement the same thing for > > long getLongHash() ?> > rewrite the classes to use long as a hash? I don't know about re-implementing anything. It really wouldn't be a hash. It would be an ID.
dubwaia at 2007-7-19 18:38:04 > top of Java-index,Other Topics,Algorithms...
# 17

I think re-implementing the hash using long is rather useless. If you take a look on HashMap source code, it uses 2 dimmensional Object[][] to store the values. It uses hash code to look up and then equals() to traverse + find the correct value. So, creating unique long hash makes sense only if you subclass HashMap/Hashtable to use 1 dimension array Object[Long.MAX_VALUE] and use the hash as index ... It is fast, but it won't compile.

lexzeusa at 2007-7-19 18:38:04 > top of Java-index,Other Topics,Algorithms...
# 18

The whole point of what I was saying is that if you aren't using this number as a hashcode in the HashMap sense you might want to use a long if you really need unique values from two ints. All of these assumptions are quite dubious and the OP probably doesn't need a unique value in the first place.

dubwaia at 2007-7-19 18:38:05 > top of Java-index,Other Topics,Algorithms...
# 19

> Thanks for the reply, however, it is possible that

> SubTestNumber is zero, which eliminates the

> possibility of exponents. Any other suggestions?

The ^ operator does not do exponentation, it does a bitwise exclusive-or. See the Java Language Reference for deatails.

pmuurray@bigpond.coma at 2007-7-19 18:38:05 > top of Java-index,Other Topics,Algorithms...
# 20

> Hello all - Thanks for reading my question.

>

> Here's my problem: I am looking for a faster hashing

> algorithm to generate a unique hash based on two

> Integers

It is not possible to get a unique 32-bit hash from 64 bits of data. This is a problem that admits of a "pidgeonhole" solution. If you have 100 pidgeonholes and 101 items of mail, at least one of the pidgeonholes must have more than one item of mail, ok? There are 2^32 possible hash values, and 2^64 possible combinations of 2 ints. No matter what you do, you are going to have to double up.

However, I think you misunderstand the idea of a hashcode. Doubling up is ok, so long as (in general) the various possible pidgeonholes (hashcodes) each get roughly the same amount of mail.

If your incoming data was such that doubles (34,34), (89,89) etc were common, then yes doing an xor would be a bad choice of hash, because pidgeonhole 0 would get a disproportionate amount. Same thing if your ints tend to lie within a certain range (0-10,000 or whatever).

A ^ (((B >> 16) & 0x0000FFFF) | ((B<<16) & 0xFFFF0000))

might be a better choice than a straight xor, or even

A ^ ( ((B >> 16) & 0x000000FF)

| ((B >> 8) & 0x0000FF00)

| ((B << 8) & 0x00FF0000)

| ((B << 16) & 0xFF000000))

pmuurray@bigpond.coma at 2007-7-19 18:38:05 > top of Java-index,Other Topics,Algorithms...
# 21

Then again, there is this....

int a = 2212;

int b = 123456;

float decimalB = 0.123456;

float hash = a + decimalB;

So now the hash is 2212.123456, let float take care of iteself.

Andrew

awaddia at 2007-7-19 18:38:05 > top of Java-index,Other Topics,Algorithms...
# 22
hash returns int, not float... anyway, there's lots of suggestions.. so I think the poster had gotten the answer already .. ... unsubscribe silently ...
lexzeusa at 2007-7-19 18:38:05 > top of Java-index,Other Topics,Algorithms...
# 23
hehe, like Float.floatToIntBits( floatHash )was too much of a leap!?
awaddia at 2007-7-19 18:38:05 > top of Java-index,Other Topics,Algorithms...