Fast dual-integer hash algorithm
Hello all - Thanks for reading my question.
Here's my problem: I am looking for a faster hashing algorithm to generate a unique hash based on two Integers (TestNumber and SubTestNumber). The current (funcitonal) algorithm I'm using is to take the hashcode of the String: "[TestNumber] [SubTestNumber]". This works, but I would like it to be faster. Does anyone have any suggestions? Here's the code I use:
StringBuffer sb = new StringBuffer(16);
sb.append(getTestNumber());
sb.append(" ");
sb.append(getSubTestNumber());
return sb.toString().hashCode();
Thanks for your help in advance!
> generate a unique hash based on two> Integers (TestNumber and SubTestNumber). int hashCode = TestNumber ^ SubTestNumber;
Thanks for the reply, however, it is possible that SubTestNumber is zero, which eliminates the possibility of exponents. Any other suggestions?
Bear with me, It's been a long day. I could add one and then that would take care of the 'zero case'. I'll give that a try. Thanks!
Actually, this won't work for us. Here's the problem:TestNumber: 4SubTestNumber: 1Hash = 4TestNumber: 2SubTestNumber: 2Hash = 4This doesn't produce a unique key, but thanks for your help. Any other suggestions?
> This doesn't produce a unique key, but thanks for your> help. Any other suggestions?You can't produce a unique int from two ints. Think about it... how many pairs of ints are there? Fortunately for you, hashCode() doesn't require uniqueness.
int int1 = 123;int int2 = 345;hashcode:int hash = (int) ((int1 << 32) + int2);or:int hash = (int1 + "xxx" + int2).hashCode();But, I don't know how reliable it is...
>You can't produce a unique int from two ints. Think about it... how many pairs of ints are there? Fortunately for you, hashCode() doesn't require uniqueness. Yes, he is right ... there is equals(), and HashSet, HashMap, and HashTable uses this along with hashCode().
The bitwise XOR operator looks a lot like a mathematical exponent operator, but it is not the same!!!
TestNumber: 4
SubTestNumber: 1
Hash = 4 Wrong!!! The right answer is 5!
TestNumber: 2
SubTestNumber: 2
Hash = 4Wrong! The right answer is 0!
The problem with a simple xor (^) is if the two numbers are equal (or related):
7^7 = 0
3^3 = 0
and:
10 ^ 2 = 8
11 ^ 3 = 8
12 ^ 4 = 8
> int hash = (int) ((int1 << 32) + int2);
isn't always int1<<32 == 0 ?
>
> int hash = (int1 + "xxx" + int2).hashCode();
It is very reliable but is just a bit slow.
The documentation for String's hashCode() implementation is:
Returns a hash code for this string. The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
So what I did was that I just copied this algorithm to int[] (which is the general case of 2 ints) instead of char[]
public int hashCode(){
//same algorithm as for String
int hash = 0;
for (int i = 0; i < data.length; i++) {
hash = 31*hash + data[i];
}
return hash;
}
Gil
> generate a unique hash based on two
> Integers (TestNumber and SubTestNumber).int hashCode = TestNumber ^ SubTestNumber;
or if the two numbers are likely to be "close"int hashCode = ((TestNumber<<16)|(TestNumber>>>16)) ^ SubTestNumber;
>isn't always int1<<32 == 0 ?
sorry, I forgot to typecast it to long ... (long) int1 << 32
>The documentation for String's hashCode() implementation is:
yes, I know that.
>So what I did was that I just copied this algorithm to int[] (which is the general case of 2 ints) instead of char[]
You're right.. But "123xxx456".hashCode() should generate more unique hashcode rather int[] because it takes more computations/iterations... perhaps..
> You're right.. But "123xxx456".hashCode() should
> generate more unique hashcode rather int[] because it
> takes more computations/iterations... perhaps..
I truly think that the described hashCode() should be unique enough. And to create a string is very slow. More computations or iterations are by no means any garantue of more unique hashCode.
Gil
The question is what do you expect to gain by using a combination of two integer values?
The only way you can get a unique hash from two ints is if you know the given ints will be in the range of short.
Then you can just shift one to 16 bits and add them.
Otherwise, you cannot get a unique hashcode. If this is for something other than implementing hashcode as defined in Object, you could generate a unique long from any two ints.
> The only way you can get a unique hash from two ints
> is if you know the given ints will be in the range of
> short.
>
> Then you can just shift one to 16 bits and add them.
>
> Otherwise, you cannot get a unique hashcode. If this
> is for something other than implementing hashcode as
> defined in Object, you could generate a unique long
> from any two ints.
re-implement the same thing for
long getLongHash() ?
rewrite the classes to use long as a hash?
> re-implement the same thing for > > long getLongHash() ?> > rewrite the classes to use long as a hash? I don't know about re-implementing anything. It really wouldn't be a hash. It would be an ID.
I think re-implementing the hash using long is rather useless. If you take a look on HashMap source code, it uses 2 dimmensional Object[][] to store the values. It uses hash code to look up and then equals() to traverse + find the correct value. So, creating unique long hash makes sense only if you subclass HashMap/Hashtable to use 1 dimension array Object[Long.MAX_VALUE] and use the hash as index ... It is fast, but it won't compile.
The whole point of what I was saying is that if you aren't using this number as a hashcode in the HashMap sense you might want to use a long if you really need unique values from two ints. All of these assumptions are quite dubious and the OP probably doesn't need a unique value in the first place.
> Thanks for the reply, however, it is possible that
> SubTestNumber is zero, which eliminates the
> possibility of exponents. Any other suggestions?
The ^ operator does not do exponentation, it does a bitwise exclusive-or. See the Java Language Reference for deatails.
> Hello all - Thanks for reading my question.
>
> Here's my problem: I am looking for a faster hashing
> algorithm to generate a unique hash based on two
> Integers
It is not possible to get a unique 32-bit hash from 64 bits of data. This is a problem that admits of a "pidgeonhole" solution. If you have 100 pidgeonholes and 101 items of mail, at least one of the pidgeonholes must have more than one item of mail, ok? There are 2^32 possible hash values, and 2^64 possible combinations of 2 ints. No matter what you do, you are going to have to double up.
However, I think you misunderstand the idea of a hashcode. Doubling up is ok, so long as (in general) the various possible pidgeonholes (hashcodes) each get roughly the same amount of mail.
If your incoming data was such that doubles (34,34), (89,89) etc were common, then yes doing an xor would be a bad choice of hash, because pidgeonhole 0 would get a disproportionate amount. Same thing if your ints tend to lie within a certain range (0-10,000 or whatever).
A ^ (((B >> 16) & 0x0000FFFF) | ((B<<16) & 0xFFFF0000))
might be a better choice than a straight xor, or even
A ^ ( ((B >> 16) & 0x000000FF)
| ((B >> 8) & 0x0000FF00)
| ((B << 8) & 0x00FF0000)
| ((B << 16) & 0xFF000000))
Then again, there is this....
int a = 2212;
int b = 123456;
float decimalB = 0.123456;
float hash = a + decimalB;
So now the hash is 2212.123456, let float take care of iteself.
Andrew
hash returns int, not float... anyway, there's lots of suggestions.. so I think the poster had gotten the answer already .. ... unsubscribe silently ...
hehe, like Float.floatToIntBits( floatHash )was too much of a leap!?