String equals implementation

It seems like atleast as of JDK1.5 implementation of java.lang.String class's equals method (public boolean equals(Object anObject)), in case the two strings are of same length, entire content is compared.

Won't it be worthwhile to also ensure that - if their hash is available, the hash matches before comparing the entire string character arrays? Seems like that would offer a significant performance boost.

-Viral

[438 byte] By [vir123a] at [2007-11-26 17:49:53]
# 1

> Won't it be worthwhile to also ensure that - if their

> hash is available, the hash matches before comparing

> the entire string character arrays? Seems like that

> would offer a significant performance boost.

Definitely not. Heres the thing: A strings hashcode is calculated using some mathematical formula that depends on all of the letters. Doing a string comparison, if the first 2 characters don't match, you can return false right there. So calculating hashcodes first could definitely make performance a lot worse and wouldn't make it a lot better.

tjacobs01a at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 2
It is true that you may not compute hash if it is not known.But my argument is if hash is already available (computed earlier eg an earlier invocation of hashCode()), then it must be used as there is no point in comparing all characters when hash is known to be different.
vir123a at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 3

On the other hand, since String is immutable, it is possible that its hashCode() method just calculates the hash code once, the first time it is requested, and then caches the result.

I don't know whether this is the case or not. But the answer to "Is X faster than Y" is often best determined by trying it, rather than speculating how things might be implemented.

DrClapa at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 4

> On the other hand, since String is immutable, it is

> possible that its hashCode() method just calculates

> the hash code once, the first time it is requested,

> and then caches the result.

Yes hashcode is is calculated just once and stored in the hash member. Refer hashCode() implementation.

vir123a at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 5

The trouble with deciding what's best is that a common class like String is used in countless contexts.

For example, you could be comparing similar URL strings, where the first 88 characters are often the same, or you could be comparing dictionary words where the first characters are usually different. It's hard to generalize -- what's the benchmark?

DrLaszloJamfa at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 6

Oh good. Another "save me every last nanosecond out of my silly app" discussion. Is this Martin Hilpert in disguise?

Well guess what, if you're so concerned about it, Java is not for you. You'd better write your all-important speedy app in machine language so you can control every tidbit of execution.

warnerjaa at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 7

> Yes hashcode is is calculated just once and stored in

> the hash member. Refer hashCode() implementation.

That's c%l, but notice that the API doesn't mention caching,

so there's no compunction on other implementation to cache, too.

I remember reading somewhere that originally the hashCode wasn't computed based on all the characters in the string -- it only sampled a few for efficiency sake. Here's the API for version 1.0.2 -- there's no mention of what algorithm is used. Hmmm:

http://www.aquaphoenix.com/ref/jdk1.0.2_api/java.lang.String.html#8782

So implementations, high-level algorithms, every API specs change over time. YMMV.

DrLaszloJamfa at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 8
> For example, you could be comparing similar URL> strings, where the first 88 characters are often the> same...Like the keys on a piano, baby.Yawmark (<-- Ain't that grand?)~
yawmarka at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 9

> I remember reading somewhere that originally the

> hashCode wasn't computed based on all the characters

> in the string -- it only sampled a few for efficiency

> sake. Here's the API for version 1.0.2 -- there's no

> mention of what algorithm is used. Hmmm:

> http://www.aquaphoenix.com/ref/jdk1.0.2_api/java.lang.

> String.html#8782

>

> So implementations, high-level algorithms, every API

> specs change over time. YMMV.

However, remember the hashCode contract: equal objects must have equal hash codes. Refer: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html#hashCode()

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

Therefore, it is reasonable to infer that the hashcodes, if present and not equal means that the two strings are not equal. Content is checked only when:

a) Length is equal and

b) Hash codes are absent/equal

vir123a at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 10

> Therefore, it is reasonable to infer that the hashcodes, if present and not equal means that the

> two strings are not equal. Content is checked only when:

>

> a) Length is equal and

> b) Hash codes are absent/equal

I agree that different hash codes imply nonequal objects. The "content is checked only when" part is your addition. Feel free say this is a reasonable thing to infer, but reasonable is not a legal contract.

DrLaszloJamfa at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 11

> I agree that different hash codes imply nonequal objects.

It should do more than merely imply. According to the contract of hashCode(), equal objects must have the same hashCode. Unequal objects may also have the same hashCode, but different hash codes means inequality.

You're probably saying the same thing. I'm just trying to add weight to the whole different hash codes == unequal objects thing... :o)

~

yawmarka at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...
# 12

>

> You're probably saying the same thing. I'm just

> trying to add weight to the whole different hash

> codes == unequal objects thing... :o)

>

> ~

Yes, we are trying to same the same thing here. If hash codes are equal, then no doubt content must be checked and verified to ensure that two strings are indeed same - which covers the above usecase.

The point is that String comparision is a vital part of many applications and any effort to ensure its performance is vital. It is not easy to punt away String class in favor of a "more suitable" class for your application as String is pervasive throughout the apis.

vir123a at 2007-7-9 5:02:23 > top of Java-index,Java Essentials,Java Programming...