HashSet doesnt remove duplicates

Hello

I am trying to remove duplicates from a list using this code

Set set =new HashSet();

ListIterator li = list.listIterator();

while (li.hasNext())

{

A pt = (A) li.next();

duplicate = pt.getID();//duplicate is declared globally

if (!set.add(pt))

{

System.out.println("Duplicate found: " + pt.getID());

}

}

but it isnt removing duplicates. I read somewhere that I should override the equals() and hashCode() methods. But I am a bit confused as to how to write my equals method. The list contains objects of class A, which has many getters and setters. The elements are considered to be duplicate when getID() getter returns same value for two objects. Can some one tell me how should i go about writing the equals method. And why the default equals method for objects isnt working in this case?

This is how i tried to write my equals() method but it isnt working

publicboolean equals(Object o)

{

if (o ==this)

returntrue;

if (!(oinstanceof A))

returnfalse;

A pn = (A) o;

return pn.getID() == duplicate;// duplicate has been set before calling add

}

[1916 byte] By [bhaarat_javaa] at [2007-10-3 3:18:52]
# 1

I have tried to override hashcode as well.

public int hashCode() {

int result = 17;

result = 37*result + Integer.parseInt(duplicate);

System.out.println("Came here with result = " + result);

return result;

}

bhaarat_javaa at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 2
Try changing the last line in equals() method fromreturn pn.getID() == duplicate; toreturn pn.getID() == this.getID();
Vchutkia at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 3
HashSet ref = new HashSet( list ); // create a HashSetabove line creates a hashset from a list which doesnot contain any duplicates
Vchutkia at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 4

Since you are trying to parse the ID in your hashCode() method, I'm going to assume it is a String. Reference types (such as Strings) should be compared using equals(), not ==.

For references, the == indicates object identity, i.e. the reference on the left-hand side and the reference on the right hand side point to the exact same object (in other words: both references have the same value).

equals() indicates object equality, i.e. the left-hand side and right-hand side references point to objects that are logically/functionally equal.

The default implementation of equals() in Object uses ==, which is why it is recommend to override equals() in your own classes. When you override equals() you should also override hashCode().

Herko_ter_Horsta at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 5

> Since you are trying to parse the ID in your

> hashCode() method, I'm going to assume it is a

> String. Reference types (such as Strings) should be

> compared using equals(), not ==.

>

you are absolutley right. I thought that making the following change inside the equals method would fix the problem.

return pn.getID().equals(duplicate);

...but still no luck. If you notice in my hashCode method i have a System.out statement. That statement is never printed. Does that tell me that the hashCode() method is never happening? I never actualy CALL the hashCode method or equals method. I think they are being called implicitly in the following line of code in the loop ( am i right?)

if (!set.add(pt))

Vchutki:

HashSet ref = new HashSet( list ); // create a HashSet

thats doesnt seem to work. I tried that the very first time since it looks so temptingly easy. But for that to work equals and hashCode methods should also be overriden. The problem is getting equals and hashCode methods right

:(....

I am just thinking about sorting the list myself and forget calling Set and HashSet stuff. The list is already sorted so it shouldnt be that hard. But still I'd like to solve this puzzle.

bhaarat_javaa at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 6

Check this ..it works

package com.test;

import java.util.ArrayList;

import java.util.HashSet;

public class Test {

public static void main(String[] args){

ArrayList list = new ArrayList();

//list.add(new Integer (6));

//list.add(new Integer (2));

//list.add(new Integer (3));

list.add(new Obj (6));

list.add(new Obj (6));

list.add(new Obj(2));

list.add(new Obj(2));

list.add(new Obj(3));

list.add(new Obj(3));

HashSet ref = new HashSet( list );

Object[] result= ref.toArray();

System.out.println(result[0]+""+result[1]+""+result[2]);

}

}

class Obj{

int ID;

Obj(int id){

this.ID = id;

}

public String toString(){

return String.valueOf(ID);

}

public boolean equals(Object o)

{

if (o == this)

return true;

if (!(o instanceof Obj))

return false;

Obj pn = (Obj) o;

System.out.println(pn.getID() +"==="+this.ID);

System.out.println(pn.getID() == this.ID);

return pn.getID() == this.ID; // duplicate has been set before calling add

}

public int getID() {

return ID;

}

public void setID(int id) {

ID = id;

}

}

Vchutkia at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 7

You have to Properly implement hashCode and equals Methods.

Be very vary of hashCode().. This is the called first to get the hash bucket in which the newly inserting element is checked.

equals is then called if there is an in-bucket collision

public static class TestSupport {

private int t;

public TestSupport(int t) {

this.t = t;

}

public boolean equals(Object c) {

if (c == this)

return true;

if (!(c instanceof TestSupport))

return false;

return (t == ((TestSupport)c).getT()) ;

}

public int hashCode() {

return t;

}

public int getT() {

return t;

}

public String toString() {

return t+"";

}

}

}

if you donot have a LOT of elements, You can also just make hashcode return a integer constant (This will create only one Bucket, in efficient)

like public int hashCode() {

return 37;

}

sar.ha at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 8

Looking at the code, if you are sure that the object A can be identified by its ID, and objects with identical ID's will be identical, I would use a HashMap.

Traverse your list and put the each object into the HashMap using the ID as the key. Any duplicates will overwrite the last instance that was encountered but if they're all the same who cares. You can then create the set of objects by traversing the HashMap (or just use the HashMap since I'm betting you'll be needing the identifier reference sometime in the future to identify the object again, and it works better than traversing the list again looking at each object.)

tusa at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 9

Let's go back to basics.

You want to put instances of class A in a HashSet. This means class A needs to contains a proper implementation of equals() and hashCode().public class A {

public String getID() {

... // return the ID

}

public boolean equals(Object other) {

boolean result = this == other;

if(!result && other instanceof A) {

A otherA = (A)other;

result = getID().equals(otherA.getID());

}

return result;

}

public int hashCode() {

return getID().hashCode();

}

}

Using this code will result in objects of type A that can be put in a HashSet that won't accept duplicates.

Herko_ter_Horsta at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 10

Lets really go back to basics...

What does it mean for an object to be equal to another?

Remember the implementation cannot know the content, so that leaves us with only the object as a whole.

As an example, one measure of equality would be an identical size in bits. However, it is possible, however improbable, for two different objects to have the same size but different values.

There are ways around the equal size problem, a checksum is the most obvious, but for a general purpose implementation, that covers every possibility, how do you implement equality? Without the context of the object you will always be at risk unless you are comparing objects that can be considered atomic, and are not subject to interpretation (i.e. 1 declared as an int is pretty well defined.)

It is reasonably simple to build something that looks into the context of the object and implement your own equality methods instead of trying to guess what the object has for its rules.

tusa at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 11

tus, sorry, but you are not talking about basics.

You are posing a philosophical question and you are illustrating it with a contrived, confusing example that cannot be implemented in Java.

An object should know whether or not it is equal to another regardless of context. If context is an issue, use an external Comparator.

Herko_ter_Horsta at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 12

Sorry if I got confusing about what I was trying to say. I was actually trying to expand on your earlier post, where you mentioned that the class needs the proper implementation of equals() had hashCode().

I went over this with one of my programmers fairly recently. He was having a hard time understanding why he was getting a false when the contents of the object were the same. Of course he did not override the equals() method and was doing an object comparison, and not one on the contents of the objects.

Maybe the following examples will better illustrate what I was trying to say.

Without equals:

public class EqualityExample {

public EqualityExample()

{

}

/**

* @param args

*/

public static void main(String[] args) {

EqualityExample ee = new EqualityExample();

// TODO Auto-generated method stub

DataClass testObject1 = ee.new DataClass("A String");

DataClass testObject2 = ee.new DataClass("A String");

if (testObject1.equals(testObject2))

{

System.out.println("are equal");

}

else

{

System.out.println("not equal");

}

}

public class DataClass

{

private String someVal;

public DataClass(String someVal)

{

this.someVal = someVal;

}

}

}

With equals:

public class EqualityExample {

public EqualityExample()

{

}

/**

* @param args

*/

public static void main(String[] args) {

EqualityExample ee = new EqualityExample();

// TODO Auto-generated method stub

DataClass testObject1 = ee.new DataClass("A String");

DataClass testObject2 = ee.new DataClass("A String");

if (testObject1.equals(testObject2))

{

System.out.println("are equal");

}

else

{

System.out.println("not equal");

}

}

public class DataClass

{

private String someVal;

public DataClass(String someVal)

{

this.someVal = someVal;

}

public boolean equals(DataClass object)

{

if (object.someVal == this.someVal)

{

return true;

}

else

{

return false;

}

}

}

}

tusa at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 13
Did you also tell "your programmers" how to correctly override the equals() method so thata) it actually does override Object.equals()b) it correctly obeys the contract of equals()
dannyyatesa at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 14

> As an example, one measure of equality would be an

> identical size in bits. However, it is possible,

> however improbable, for two different objects to have

> the same size but different values.

In addition to what Herko responded to this, I'd point out that it's very highly probably that two different objects will have the same size but different values.

(Not that we can even test for that in Java.)

jverda at 2007-7-14 21:10:42 > top of Java-index,Core,Core APIs...
# 15

> With equals:

> > public boolean equals(DataClass object)

> {

> if (object.someVal == this.someVal)

> {

> return true;

> }

> else

> {

> return false;

> }

> }

> }

> }

>

This method doesn't override Object.equals() and it won't work even if it did: the Strings should be compared using equals() as well, not using ==

You just keep piling on the confusion, don't you?

Herko_ter_Horsta at 2007-7-21 10:08:46 > top of Java-index,Core,Core APIs...
# 16

I'll say it again... I am not trying to create more confustion, and to those who I am let me apologize.

I will also be the first to admit that I do not know everything and there are times when I am wrong. Since there are those that are saying I'm wrong here, could someone please tell me where the problems are so I know better in the future?

Looking at things, I will agree that the string comparison I have in the method should be done with equals() and not the == operator.

The strange thing is when I run the example with the equals method, and change the values being passed in, I get the behavior I expected. If I'm not doing what I think I am could someone tell me what is actually happening?

tusa at 2007-7-21 10:08:46 > top of Java-index,Core,Core APIs...
# 17

1) The correct signature for equals is: public boolean equals(Object obj) - in particular, it is a very common mistake to use your class as the parameter type, instead of Object, but this will not override Object.equals(), it will simply overload it which is not what you want.

2) equals() should accept null and return false. Your implementation will throw a NullPointerException

3) equals() should be reflexive - that is a==b should mean b==a. The implication of this is that the two objects (this, and the obj parameter) should be of the same class.

A typical implementation would be:

public boolean equals(Object obj) {

if (obj == this) return true;

if (obj == null) return false;

if (obj.getClass() != this.getClass()) return false;

MyType that = (MyType)obj;

// compare contents of this and that here...

}

Note: an implementation of equals() which uses instanceof is plain wrong - regardless of how vociferously anybody who appears to be authoritative might tell you otherwise.

Oh, and don't forget to override hashCode() if you're overriding equals().

dannyyatesa at 2007-7-21 10:08:46 > top of Java-index,Core,Core APIs...
# 18

> Note: an implementation of equals() which uses

> instanceof is plain wrong - regardless of how

> vociferously anybody who appears to be authoritative

> might tell you otherwise.

No.

There are cases when you must use instanceof. For example, every Map, Set and List implementaton, in order to meet those interfaces' respective equals contracts, must use instanceof Map/List/Set. In List's API doc for equals, it says, Returns true if and only if the specified object is also a list,.... This means you have to use instanceof, not getClass.

You use instanceof when you want different implementations to be able to be equal. An ArrayList can equal a LinkedList, if they both contain the same elements in the same order. So ArrayList, LinkedList, and every other class that implements List, must use instanceof List.

jverda at 2007-7-21 10:08:46 > top of Java-index,Core,Core APIs...
# 19

Well, OK, I over simplified a bit.

I like to think of the difference being between a data structure and a data object. Naturally, what you said is right for a data structure which has a well-defined equality relationship between different implementations of the same data structure interface.

But most people don't implement equals() in these sorts of scenarios (in fact, I'm not sure I've ever done it in anger in 11 years of writing Java code), and hence, I would contend, using instanceof is wrong in 99.9% of cases.

dannyyatesa at 2007-7-21 10:08:46 > top of Java-index,Core,Core APIs...