Sets
Howdy.
I need to use the concept of a basic mathematical set in a project I'm working on. I've viewed the two set implementations that Java supplies (TreeSet and HashSet), and have found neither suitable for my purposes.
IMHO, a Set object should work with the following features:
- All objects in the set are unique.
i.e. :
class RC implements Comparable
{
public int r; public int c;
public RC(int r,int c) {this.r=r; this.c=c;}
public boolean equals(RC other) {...}
public boolean compareTo(Object other) {...}
}
...
SomeSetImplementation s = new SomeSetImplementation();
s.add( new RC(1,2) );
s.add( new RC(1,7) );
s.add( new RC(2,2) );
s.add( new RC(1,2) ); // duplicate, don't add to set
...
The resulting set 's' should contain { (1,2), (1,7), (2,2) }.
- Iterator cycles through objects in a random order.
i.e. :
SomeSetImplementation s = new SomeSetImplementation();
s.add( new RC(1,2) );
s.add( new RC(1,7) );
s.add( new RC(2,2) );
s.add( new RC(1,2) );
...
for(Iterator i=s.iterator(); i.hasNext();)
{
System.out.println(s.next()); // Could print elements in any order
}
Neither the HashSet nor the TreeSet abide both of these rules. The HashSet would allow two occurrences of (1,2) to exist in the set (because it's comparing memory locations instead of values? I have a c++ background :\ ), and TreeSet's iterator orders the resulting RC pairs according to the compareTo method (which must be deterministic [non-pseudo-random either] for speeding it's algorithm).
Consequently, I built my own Set called LogicSet which extends TreeSet. I added a few methods such as union and intersection (really just rewording of addAll and retainAll TreeSet methods). Finally, I attempted to override the iterator() method to return a new type of iterator - LogicSetIterator, which I built as an inner class to LogicSet. Right now, the method LogicSet.iterator() returns a new LogicSetIterator.
I have not written the code for LogicSetIterator, and am having some problems coming up with an elegant solution. Hence, my question is two-fold:
First, what would be a decent way of returning the (sorted) elements in random order. I've thrown around ideas of keeping a whole copy of the set in the iterator object, but this seems wasteful to me. Then again, it may be the only solution. I've never written an iterator before, so suggestions would be helpful.
Or second, am I approaching this problem incorrectly? Is there a better design pattern for what I'm trying to accomplish?
Thanks for reading this far, and even more thanks if you try to help me out :]
[2807 byte] By [
iamiqrk] at [2007-9-30 15:22:28]

For the first problem with HashSet, couldn't you overload the equals method in your class to something like
boolean equals(Object o) {
if (o instanceof YourThing) {
return ((YourThing)o).x == this.x && ((YourThing)o).y == this.y;
}
return false;
Then you could use HashSet.
Now for the second problem, if speed to lookup isn't a problem, could you use a ArrayList to store your set? Then for the iterator method, it returns an iterator with a permutation of an array of ints from 0 to size-1. Then use that to reference items in your set. Just call permutate on the array everytime you try to get an iterator.
Hope that helps.
The answer to "how do I implement LogicSetIterator" is "it depends heavily on how you have implemented LogicSet".
If LogicSet is backed by, for instance, an float array ( float[] data ), then it is quite straight forward to implement the iterator: Just add an offset field, and cause next() to return data[offset++] until offset == data.length.
For sorting, try to google for words like "bubble sort", "insertion sort", "quicksort", "merge sort", or just "sort algorithms".
> The HashSet would allow two occurrences
> of (1,2) to exist in the set (because it's comparing
> memory locations instead of values?
This isn't true. HashSet uses the kind of equality comparison you want, not memory locations. It implements Set and if you look at the javadoc for the add() method of Set, it says the following:
"adds the specified element, o, to this set if this set contains no element e such that (o==null ? e==null : o.equals(e)). If this set already contains the specified element, the call leaves this set unchanged and returns false."
if HashSet compared by memory location (which I take to mean that Hashset uses o==e, rather than o.equals(e)), then HashSet would not uphold the contract of Set.
Also, what do you mean by 'random' order? The order of Iterators returned by HashSet can change over time, if that's what you want. I believe that the iterator will order the elements based on the hash bucket they fall into (i.e. first hash bucket first, then the second, ... then the last bucket), this will not normally be the same order that you put the entries into the set. If you consider that to be random then you're good.
Guys, thanks for some great, quick responses. Let me attempt to address each response:
octobernight
I thought that was how hash sets worked too (it's logical), but look at the following code:
import java.util.HashSet;
public class MyHashTest
{
public class Pair
{
public int x,y;
public Pair(int x, int y)
{
this.x = x;
this.y = y;
}
public boolean equals(Object other)
{
System.out.println("called: " + this + "::equals(" + other + ")");
if(other instanceof Pair)
{
return (this.x == ((Pair)other).x && this.y == ((Pair)other).y);
}
return false;
}
public String toString()
{
return "["+x+","+y+"]";
}
}
public static void main(String args[])
{
new MyHashTest();
}
public MyHashTest()
{
HashSet hs = new HashSet();
hs.add( new Pair(1,2) );
hs.add( new Pair(6,1) );
hs.add( new Pair(7,9) );
hs.add( new Pair(1,2) ); // Duplicate
hs.add( new Pair(3,5) );
System.out.println("hs = " + hs);
}
}
The output is: "hs = [[1,2], [1,2], [3,5], [7,9], [6,1]]"
Notice that it has [1,2] twice! Not only that, but the System.out.println was never called in the RC.equals(Object) method!
As speed is not absolutely critical, the permutation on an ArrayList seems like a good idea if there are no better design ways to solve this problem. I will use this methodology if I can't get a java standard solution.
Ivar_Svendsen
I realized after-the-fact that I didn't make this entirely clear, but my LogicSet is simply a subclass of TreeSet ("Consequently, I built my own Set called LogicSet which extends TreeSet."). Since I want it entirely unsorted, I think I'm going to have to play with octobernight's idea of a random permutation of indicies.
stokes_david
David, see the code above for octobernight. Hopefully I'm not making some juvenille syntax mistake here :\. I'm using java version 1.4.2_04-b05, btw - hope that doesn't matter. As for the randomization, that's a very good point. I think that even if I got HashSet to work I'd create a subclass that overrides the iterator() method to make sure it's much more (pseudo) random, as I'm doing probability (and expected value) calculations off the results.
David,Also, in my test code above, all add method calls return true :\
I guess you could overload the add and contains method to correctly call the equals method. This code fixes it.
import java.util.HashSet;
import java.util.Iterator;
public class MyHashSet extends HashSet {
public class Pair {
public int x, y;
public Pair(int x, int y) {
this.x = x;
this.y = y;
}
public boolean equals(Object other) {
System.out.println("called: " + this +"::equals(" + other + ")");
if (other instanceof Pair) {
return (
this.x == ((Pair) other).x && this.y == ((Pair) other).y);
}
return false;
}
public String toString() {
return "[" + x + "," + y + "]";
}
}
public static void main(String args[]) {
new MyHashSet();
}
public MyHashSet() {
super();
add(new Pair(1, 2));
add(new Pair(6, 1));
add(new Pair(7, 9));
add(new Pair(1, 2)); // Duplicate
add(new Pair(3, 5));
System.out.println("hs = " + this);
}
/* (non-Javadoc)
* @see java.util.Collection#add(java.lang.Object)
*/
public boolean add(Object o) {
if (contains(o)) {
System.out.println("Sucker!");
return false;
}
return super.add(o);
}
public boolean contains(Object o) {
Iterator iter = super.iterator();
while (iter.hasNext()) {
if (iter.next().equals(o)) {
return true;
}
}
return false;
}
}
The correct solution is to override both equals(Object) and hashCode() in such a way that equal objects always have equal hashcodes, as described in the documentation for Object (but not mentioned in Set or HashSet). As for random order, why does it need to be random, rather than simply unspecified as is the case with HashSet?
nasch_:
I guess if you're doing statistical analysis or probability in a set, you would want the iterator to return random numbers. For example, you might want to find the expected number of consecutive numbers in a set of 100.
iamiqrk:
It's been a while since I took stocastic simulations, but I think the algorithm to permutate a list of numbers is easy to implement. Someone correct me if I'm wrong, because I remember there was a method that everyone thought worked well to permutate but it didn't generate all possible permutations with equal probability.
is it:
for (i = 0; i < array.length; i++) {
x=random number between i and array.length
swap(i,x)
}
Anyway, iamiqrk, if you don't mind, what is your project? I love combining math and Java.
> I guess if you're doing statistical analysis or probability in a set, you would want the iterator to
> return random numbers.
I'm afraid you're over-complicating matters; nasch_ is right: the hashCode method needs to be
overridden to get the stuff working. The OP didn't do that, so the Object.hashCode gets called,
which in turn will never return equal hash values for equal but not identical objects.
kind regards,
Jos
JosAH at 2007-7-5 22:27:26 >

From the Object javadoc: Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes
Even after you have correctly overridden equals() and hashCode() for your Pair class (BTW, are you aware of java.awt.Point which looks similar?) HashSet will not return items in random order, just in an unspecified order (but very likely the same order every time - at least as long as the contents don't change). If you want random behaviour, you will still need to implement this yourself.
My recommendation would be:
- make sure the items that go into the set correctly override equals() and hashCode()
- extend HashSet with the (random) functionality you need (instead of completely reimplementing it):
public Object getRandom();
public Itererator getRandomIterator();
Ah, you guys are gurus. How do you have the entire java javadoc api memorized? :]
nasch_
Overriding hashCode() worked stupendously! I'm not particularly mathematically inclined, so I had to borrow a hashing algorithm from an old professor of mine, but it looks like it's doing the trick.
octobernight
As this isn't a make-it-to-pluto-or-get-sucked-into-a-black-hole type of mission, I'm going to go with your permutation algorithm. I googled "random permutation," and apparently the incorrect version has 'x' being a random number between zero and array.length (determining why this leads to a non-linear distribution is beyond my neglected statistic abilities).
As for what this project is: I'm with a group of new CS hires, and when the boss man told us to refresh our java skills, we decided to do so by building AIs to a game we devised and pit them against each other. The game is similar to an evolutionary 'Game of Life' where you try to dominate the other AI's by expanding, attacking, and defending and such. Sets came in handy for expressing different situations and for certain algorithms. Also, given a set of adjacents to a friendly piece, I didn't want to expand, say, always clockwise as this may be something my opponent could pick up on - hence the randomness.
Anyway, I got quite sidetracked, and am now trying to build a formalized logic package and have some ideas for a general AI package. We'll see how those best laid plans go.
All...
Thanks for your help. I've decided to keep a subclass of HashSet (LogicSet) and override the iterator() method to return in a randomly permuted order. Additionally, as per Herko_ter_Horst et al., I'll verify that my items are uniquely identified.
Thanks for all your help guys :]
> Ah, you guys are gurus. How do you have the entire
> java javadoc api memorized? :]
I don't, I just ran into that exact same problem myself some time ago. Now you, like me, will
remember the solution. :-)
> nasch_
> Overriding hashCode() worked stupendously!
Glad to hear it!
Can I just suggest that you consider using composition rather than extending HashSet?Just look into it a bit before you go ahead. If you own a copy of Effective Java, or can get your hands on one, there's a good discussion there.
That's an excellent question to think about. In this case I think that
the thing you're making really is a HashSet, with some different behavior.
If that is true, IMO extending HashSet is appropriate. If it is not true, extending
HashSet is definitely inappropriate - no opinion about it! :-) Some say extending
is always bad, but I'm not in that camp. Actually I know of only one person in that camp...
stokes_david and nasch_
Hrm. This is probably a topic that comes up often in this forum, as choosing between Composition and Inheritence seems to be rather sticky. I gave some thought to your objections and came up with the following dilema:
A Java 'Set' object contains the very similar functionality of a math set, but uses a different nomenclature.
I.e. the following methods would be implemented identically (Listed as math over Java):
union
addAll
intersection
retainAll
difference
removeAll
cardinality
size
exists
contains
isNull
isEmpty
Engineers don't seem to put much design emphasis on nomenclature (as long as you retain consistency), however it seems important here for something beyond readability. It's as if a math model of a set is a different type of object than a java set even though they would have similar internals. Or maybe I've been thinking about this problem too long.
So far, this seems like an argument for Composition. However, what about the almighty is-a rule? Isn't a LogicSet a Set? LogicSet doesn't have-a Set.
I think there's no question that what you're making is a Set, therefore it should directly or indirectly implement the Set interface. That still leaves open the question of whether you should use inheritence or composition WRT the HashSet class. It depends whether you want to advertise that your class is a type of HashSet. Whatever you do, don't make the decision based on what seems easier to code (doesn't sound like you're doing that). As for functionality/nomenclature, I agree (except in math isn't it called the empty set?). "As implied by its name, this interface models the mathematical set abstraction."
Well, since I don't *need* to advertise it as a HashSet, I *should not* advertise it as a HashSet.
Does it make sense to specifically not implement the java Set interface and reword the method names? Or should I implement Set AND have a rewording of the method names? Either situation seems pretty ugly. Should I just drop the rewording idea?
>
> So far, this seems like an argument for Composition.
> However, what about the almighty is-a rule? Isn't a
> LogicSet a Set? LogicSet doesn't have-a Set.
Your LogicSet is a set (logically speaking :-); but I don't think it's necessarily the case that it IS-A java.util.Set.
Unless you really want to ensure that anywhere someone can use a java.util.Set (which is also a java.util.Collection), you could also use a LogicSet instance; and any methods supported by Set or Collection are also supported by LogicSet. So anybody can addAll(anyCollection) to your LogicSet, or somebody could use the Collections.xxx(Collection) methods and pass an instance of your LogicSet. This may be exactly what you want, or it may be something that you want to avoid.
I imagine that you don't really care about having an IS-A relationship with java.util.Set; what you want is a mathematical set with the methods you've shown and that's all you really need. I think you would prefer to use your method names rather than those defined by Set -- your class is going to look pretty ugly if you try to do both. So I, personally, would not extend HashSet and I would not implement Set. LOTS of people here would disagree with this choice though, and I suppose I shouldn't try to speak for them but just say that there are lots of threads where this is kicked around.
> Well, since I don't *need* to advertise it as a
> HashSet, I *should not* advertise it as a HashSet.
>
> Does it make sense to specifically not implement the
> java Set interface and reword the method names? Or
> should I implement Set AND have a rewording of the
> method names? Either situation seems pretty ugly.
> Should I just drop the rewording idea?
I would implement Set, but not extend HashSet. Your LogicSet IS-A Set, but ISN'T(-NECESSARILY)-A HashSet.
If you really want to use the math terms, I guess you could add methods that delegate to their Java equivalents, but presumably anybody using this set in a mathemtical context will be aware of the math terms, the Java methods, and the mapping between them (you will have Javadocs, right?), so I don't see it as a real source of confusion.
JMAO
¶
jverda at 2007-7-19 21:32:49 >

> Your LogicSet is a set (logically speaking :-); but I
> don't think it's necessarily the case that it IS-A
> java.util.Set.
> Unless you really want to ensure that anywhere someone
> can use a java.util.Set
Good point.
@OP: I was assuming he wanted it to BE-A java.util.Set, and to use it in contexts where a Set would be used. If that's the case, then extend Set. If not--if it really is just a mathematical set whose implementation makes use of what Java already provides (but doesn't need to BE-A Set), then you could skip the "implements Set," and still keep the composition, and whatever choice of method names makes the most sense to you.
If you're not sure, you might want to start of having it not implement Set. Then as you're using it, if you find yourself thinking "****, I wish I could use this method that takes a Set" or if you find yourself doing something like Set set = myLogicSet.getBackingSet();
to use the contents of your LogicSet in a java.util.Set context, then you probabaly should change it to implement Set. It's probably easier to go that direction than to first implement it and then later decide not to.
Again, JMAO.
¶
jverda at 2007-7-19 21:32:49 >

YOU CANNOT IMPLEMNTS EQUALS LIKE THAT !!
> public boolean equals(Object other)
> {
> System.out.println("called: " + this + "::equals(" +
> other + ")");
> if(other instanceof Pair)
> {
> return (this.x == ((Pair)other).x && this.y ==
> ((Pair)other).y);
> }
> return false;
> }
> public String toString()
> {
> return "["+x+","+y+"]";
> }
> }
the line "if(other instanceof Pair)" is breaking the equals() specifications !
You MUST test the real type :
if(other.getClass()==getClass())
Many discutions on many forums remember you what is specified in Javadoc :
It is symmetric: for any reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
But if you use "instanceof", your "equals" is not symetric : assume you have a class A and a class B that extends A. let aA be a A instance, an aB be a B instance. Assume they have the same attribute values. aA.equals(aB) will return true beause aB is instance of B, and so instance of A (because B extends A). But aB.equals(aA) will return false because aA is not an instance of B, it is only an instance of A.
So we have aA.equals(aB) != aB.equals(aA). It is a WRONG BEHAVIOUR according java specs !!!!
So use getClass() and not instanceof !
This really sounds like bad advice. Can you show us the code for classes A and B?
> So use getClass() and not instanceof !
There are times when getClass is appropriate.
There are times when instanceof is appropriate. All List, Map, and Set implementations shoud use instanceof, for example.
I'm not sure which applies to the OP here, but he should learn when one or the other is appropriate (if he dosn't already understand) and then use whichever one makes sense for what he's trying to do.
jverda at 2007-7-19 21:32:49 >

Stupid horiz. scroll. Resposting for readability.
> So use getClass() and not instanceof !
There are times when getClass is appropriate.
There are times when instanceof is appropriate. All List, Map, and
Set implementations shoud use instanceof, for example.
I'm not sure which applies to the OP here, but he should learn when
one or the other is appropriate (if he dosn't already understand) and then
use whichever one makes sense for what he's trying to do.
jverda at 2007-7-19 21:32:49 >

Can you give an example where it's better to use getClass()?
> Can you give an example where it's better to use> getClass()?If you want instances to only be equal to other instances of the class, but not to have parent and subclasses be equal to each other.
jverda at 2007-7-19 21:32:50 >

> > Can you give an example where it's better to use
> > getClass()?
>
> If you want instances to only be equal to other
> instances of the class, but not to have parent and
> subclasses be equal to each other.
To me this breaks the concept of the IS-A relationship.
Suppose I have Point and an extension of it called ColoredPoint.
I test for equality between: Point(1,2) and ColoredPoint(1,2, green).
If these things are not equal, then how can I say that a ColoredPoint IS-A Point?
I think that if they're not equal then ColoredPoint should not be declared as an extension of Point.
If a ColoredPoint IS-A Point, and an instance is at the same location as an instance of Point, then they've got to be equal.
> > > Can you give an example where it's better to use
> > > getClass()?
> >
> > If you want instances to only be equal to other
> > instances of the class, but not to have parent and
> > subclasses be equal to each other.
>
> To me this breaks the concept of the IS-A
> relationship.
I don't think so. IS-A doesn't have to mean "can be equal to a".
> Suppose I have Point and an extension of it called
> ColoredPoint.
> I test for equality between: Point(1,2) and
> ColoredPoint(1,2, green).
> If these things are not equal, then how can I say that
> a ColoredPoint IS-A Point?
Easy. "ColoredPoint IS-A Point." :-)
Seriously, IS-A doesn't have anything to do with equality of "values." It just means you can use an instance of ColoredPoint where a Point is expected.
Do you want ColoredPoint(1,2, blue) to be equal to ColoredPoint(1,2, green)? If so, then maybe you want to implement equals only at the Point level, and check for instanceof. But if not, then you also can't have ColoredPoint(1,2) equal to ColoredPoint(1,2, green).
Look at it this way: EVERYTHING IS-A object, but you can't have a String or a List or an Integer that's equal to an instance of plain old Object.
> If a ColoredPoint IS-A Point, and an instance is at
> the same location as an instance of Point, then
> they've got to be equal.
Maybe yes, maybe no. Depends on the semantics of your particular case. Either way is valid.
jverda at 2007-7-19 21:32:50 >

> Do you want ColoredPoint(1,2, blue) to be equal to
> ColoredPoint(1,2, green)? If so, then maybe you want
> to implement equals only at the Point level, and check
> for instanceof. But if not, then you also can't have
> ColoredPoint(1,2) equal to ColoredPoint(1,2, green).
This is the real problem, I think. If you implement equals at both levels and use instanceof then you run the risk of breaking transitivity. But if I want the blue point to be not equal to the green point, then I also want it to be not equal to a (non-colored) point at the same location. To me that means that ColoredPoint is not really a Point. -- but enough about points, I don't even know if I have one anymore :-)
I still can't think of a real world case where I want to do this. I have always used instanceof in my equals() methods (I think), and I don't remember ever having a problem. If I were in a situation where I had to use getClass() to implement equals(), then I would start re-examining my inheritance hierarchy.
I looked this up and found a couple articles -- actually, this seems to be a fairly contentious point, Bloch advocates instanceof and lots of people have attacked him. One article used a completely spurious example of overriding the Employee.equals() with Manager.equals() (Employee.equals() should obvoiusly be final). But none of them offered any good, compelling examples. And I still can't think of one.
> > Do you want ColoredPoint(1,2, blue) to be equal to
> > ColoredPoint(1,2, green)? If so, then maybe you want
> > to implement equals only at the Point level, and
> check
> > for instanceof. But if not, then you also can't have
> > ColoredPoint(1,2) equal to ColoredPoint(1,2, green).
> This is the real problem, I think. If you implement
> equals at both levels and use instanceof then you run
> the risk of breaking transitivity. But if I want the
> blue point to be not equal to the green point, then I
> also want it to be not equal to a (non-colored) point
> at the same location.
My point exactly. (No pun intended.)
> To me that means that
> ColoredPoint is not really a Point.
I don't apply that interpretation.
There might be some general OO principle that says that if A is-a B, then an A should be able to be equal to a B, but I've never heard it. (Not that I've done much formal OO study.)
> I still can't think of a real world case where I want
> to do this.
I can't say that I know of a case where you'd only want objects at a given level in a hierarchy to be able to be equal to other objects at the same level, but neither can I think of a reason to say that should never be the case.
I did think that the Point example was actually a pretty good one. I could see implementing it that way, and having CP only be able to be equal to other CP (where x, y, and color determine equality). Of course, P then has to use getClass and not instanceof, else symmetry breaks.
It may be that that's a poor OO model anyway though, for reasons other than equality problems.
> I have always used instanceof in my
> equals() methods (I think), and I don't remember ever
> having a problem.
As long as all the classes do instanceof to the same parent class or interface, it's fine.
> If I were in a situation where I
> had to use getClass() to implement equals(), then I
> would start re-examining my inheritance hierarchy.
I think I sometimes use getClass when I'm not sure what I want, because I know that using getClass everywhere, or getClass in the parent class and instanceof in the child classes will be "safe." That's not really an argument in favor of getClass--just a case where it's convenient and safer to use when you still have holes in your design to fill in.
jverda at 2007-7-19 21:32:54 >

Reposted w/o annoying horiz. scroll
> > Do you want ColoredPoint(1,2, blue) to be equal to
> > ColoredPoint(1,2, green)? If so, then maybe you want
> > to implement equals only at the Point level, and
> check
> > for instanceof. But if not, then you also can't have
> > ColoredPoint(1,2) equal to ColoredPoint(1,2, green).
> This is the real problem, I think. If you implement
> equals at both levels and use instanceof then you run
> the risk of breaking transitivity. But if I want the
> blue point to be not equal to the green point, then I
> also want it to be not equal to a (non-colored) point
> at the same location.
My point exactly. (No pun intended.)
> To me that means that
> ColoredPoint is not really a Point.
I don't apply that interpretation.
There might be some general OO principle that says that if A is-a B,
then an A should be able to be equal to a B, but I've never heard it.
(Not that I've done much formal OO study.)
> I still can't think of a real world case where I want
> to do this.
I can't say that I know of a case where you'd only want objects at a given
level in a hierarchy to be able to be equal to other objects at the same level,
but neither can I think of a reason to say that should never be the case.
I did think that the Point example was actually a pretty good one. I could see
implementing it that way, and having CP only be able to be equal to other CP
(where x, y, and color determine equality). Of course, P then has to use getClass
and not instanceof, else symmetry breaks.
It may be that that's a poor OO model anyway though, for reasons other than equality
problems.
> I have always used instanceof in my
> equals() methods (I think), and I don't remember ever
> having a problem.
As long as all the classes do instanceof to the same parent class or
interface, it's fine.
> If I were in a situation where I
> had to use getClass() to implement equals(), then I
> would start re-examining my inheritance hierarchy.
I think I sometimes use getClass when I'm not sure what I want, because
I know that using getClass everywhere, or getClass in the parent class and
instanceof in the child classes will be "safe." That's not really an argument in
favor of getClass--just a case where it's convenient and safer to use when you
still have holes in your design to fill in.
jverda at 2007-7-19 21:32:54 >

And if it appears I'm waffling relative to when I said it's sometimes appropriate to use getClass,
I should point out that my main point there was that it's not true that you should always
use getClass. That contrary to what one poster said, there are at least some cases where
instanceof is appropriate.
I do think there are probably cases where getClass is appropriate, but I'm not interested in
proving that. It may be that instanceof is always the way to go, and I can live with that.
> > So use getClass() and not instanceof !
>
> There are times when getClass is appropriate.
>
> There are times when instanceof is appropriate. All
> List, Map, and
> Set implementations shoud use instanceof, for example.
>
> I'm not sure which applies to the OP here, but he
> should learn when
> one or the other is appropriate (if he dosn't already
> understand) and then
> use whichever one makes sense for what he's trying to
> do.
jverda at 2007-7-19 21:32:54 >

The rule of thumb I was given by Angelica Langer (http://www.langer.camelot.de) is that the default situation is to test for the correct type using getClass(). This ensures fully transitive equality comparisons and disallows equality across hierarchy levels.
Where there is a hierarchy where the subclass state is not significantly different with regards to equality (e.g. functional extension rather than data extension) and there is a need to compare equality across levels, instanceof may be necessary despite lack of transitivity. Of course, if the class is declared final (there can be no subclasses), there isn't a problem with instanceof.
In the colored point situation, the question is whether the color of the point is, or is likely to be, a significant part of its state. If so, getClass() should be the choice for equality and some other means provided for comparison of superclass with subclass.
These situations can be often be clarified by ensuring that the superclass is abstract (some would argue this should always be the case), and, making another subclass of Point called, say, NoColorPoint or PlainPoint. This makes the type difference more explicit, and the decision becomes whether to implement equals() in the superclass with instanceof and not override it in the subclasses (i.e. subclass data isn't significant), or to implement equals() in each subclass using getClass() (i.e. subclass data is significant).
If it's necessary that slices of subclasses are allowed to be equal to instances of superclasses, then the equality comparison needs to be extended so that it fails if other is not the same class or a subclass of this and this is not a subclass of other, and succeeds if this is a subclass of other and the slices compare equal. This should give a transitive equality for the slices...
Just my 2 penn'orth...
> Where there is a hierarchy where the subclass state is
> not significantly different with regards to equality
> (e.g. functional extension rather than data extension)
> and there is a need to compare equality across
> levels, instanceof may be necessary despite lack of
> transitivity.
The only thing I disagree with here is breaking transitivity.
Assume C derives from B derives from A, etc.
A -- getClass
B -- getClass
C -- instanceof C
D -- instanceof C
etc.
As long as everything at the "highest" (closest to Object) instance does instanceof that class or interface (here everything C and below does instanceof C) transitivity and symmetry are preserved.
I can't think of any good reason why you'd insert a getClass below one of those instanceofs, or do an instanceof SomethingElse further down the hierarchy.
jverda at 2007-7-19 21:32:54 >

Agreed; in the case where the superclass and subclass both test for instanceof the superclass, the equality will be transitive.
For the record: if you implement hashCode() in a reasonable way, your class will work fine with HashSet.
HashSet is, as the name suggests and the documentation states, backed by a hash table, that is, and hence depending on the proper hashCode().
A side note: the provided equals() method is okay, it does not break symmetry, though it is possible for other classes to break the symmetry, but you can not blame this class for that.
java MyHashTest
called: [1,2]::equals([1,2])
hs = [[3,5], [6,1], [1,2], [7,9]]
import java.util.HashSet;
public class MyHashTest
{
public class Pair
{
public int x,y;
public Pair(int x, int y)
{
this.x = x;
this.y = y;
}
public int hashCode() {
return x << 16 + y;
}
public boolean equals(Object other)
{
System.out.println("called: " + this + "::equals(" + other + ")");
if(other instanceof Pair)
{
return (this.x == ((Pair)other).x && this.y == ((Pair)other).y);
}
return false;
}
public String toString()
{
return "["+x+","+y+"]";
}
}
public static void main(String args[])
{
new MyHashTest();
}
public MyHashTest()
{
HashSet hs = new HashSet();
hs.add( new Pair(1,2) );
hs.add( new Pair(6,1) );
hs.add( new Pair(7,9) );
hs.add( new Pair(1,2) );
hs.add( new Pair(3,5) );
System.out.println("hs = " + hs);
}
}
For those interested, my point-like object ended up looking like this:
public class RC implements Comparable
{
public int r, c;
public RC(int row, int col) {
r = row; c = col;
}
public RC(RC other) {
this.r = other.r; this.c = other.c;
}
public void setRC(int row, int col) {
r = row; c = col;
}
public boolean equals(Object other) {
if(other instanceof RC) {
return (this.r == ((RC)other).r && this.c == ((RC)other).c);
}
return false;
}
public int hashCode() {
String sKey = r+"_"+c;
int A = 1952786893;
int B = 367253;
int v = B;
for(int j=0; j < sKey.length(); j++) {
char c = sKey.charAt(j);
v = A * (v + (int)c + j) + B;
}
if (v < 0) v = -v;
return v;
}
public int compareTo(Object oOther) throws ClassCastException {
final int SAME = 0; final int GRTR = 1; final int LESS = -1;
RC other = (RC) oOther;
if (r == other.r && c == other.c) {
return SAME;
}
else {
if( r > other.r)
return GRTR;
else {
if(r == other.r) {
if( c > other.c ) {
return GRTR;
}
else {
return LESS;
}
}
else {
return LESS;
}
}
}
}
public String toString() {
return "[" + r + "," + c + "]";
}
}
and my set looks something like:
public class LogicSet
{
Set _ts;
public class LogicSetIterator implements Iterator
{
private ArrayList _arState;
private int _index;
public LogicSetIterator()
{
/* Obtain the LogicSet's iterator and create a copy of it */
_arState = new ArrayList();
for(Iterator i=_ts.iterator(); i.hasNext();)
{
_arState.add(i.next());
}
int j;
Object objI;
Object objJ;
for(int i=0; i<_arState.size()-1; i++)
{
j = (int)(Math.random() * (_arState.size()-i))+i;
objI = _arState.get(i);
objJ = _arState.get(j);
_arState.set(j, objI);
_arState.set(i, objJ);
}
_index = 0;
}
public boolean hasNext ()
{
return (_index < _arState.size());
}
public Object next ()
{
if(hasNext())
{
return(_arState.get(_index++));
}
throw new NoSuchElementException();
}
public void remove ()
{
throw new UnsupportedOperationException();
}
}
public LogicSet()
{
_ts = new TreeSet();
}
public LogicSet union (LogicSet other)
{
LogicSet r = new LogicSet ();
r._ts.addAll (_ts);
r._ts.addAll (other._ts);
return r;
}
public LogicSet union (Object o)
{
LogicSet r = new LogicSet ();
r._ts.addAll (_ts);
r.addElt(o);
return r;
}
public LogicSet intersection (LogicSet other)
{
LogicSet r = new LogicSet ();
r._ts.addAll (_ts);
r._ts.retainAll (other._ts);
return r;
}
public LogicSet difference (LogicSet other)
{
LogicSet r = new LogicSet ();
r._ts.addAll (_ts);
r._ts.removeAll (other._ts);
return r;
}
public LogicSet difference(Object o)
{
LogicSet r = new LogicSet ();
r._ts.addAll (_ts);
r.removeElt(o);
return r;
}
public void setTo (LogicSet other)
{
this._ts.clear ();
_ts.addAll (other._ts);
}
public Iterator iterator()
{
return new LogicSetIterator();
}
public int cardinality()
{
return _ts.size();
}
public boolean addElt(Object o)
{
return _ts.add(o);
}
public boolean removeElt(Object o)
{
return _ts.remove(o);
}
public String toString()
{
String r = "[";
for(Iterator i=iterator(); i.hasNext(); )
{
r = r + i.next();
}
r = r + "]";
return r;
}
}
Ignoring the fact that I used inner-classes (project constraint), I think it looks decent. A few points:
- All, after reading through your posts for several lunch-breaks in a row, I think I understand the
debate over instanceof vs. getClass() equality. I really dislike making classes final, but I don't foresee
instanceof being a problem in this case (I wont take a stance in the general case).
- BIJ001, interesting hash code. If I'm understanding you correctly, you're assuming 'y' will be less
than or equal to a 16 bit integer value. The hash code I use (borrowed) works for arbitrarily sized numbers,
but pays for it with a theta(# of digits) run time.
- While I am stoked that this LogicSetIterator works, it seems pretty inefficient (both space and timewise).
It has-an ArrayList which contains more information than the iterator itself would ever return. This seems
somewhat ugly to me, although I guess it's the same information an old-fashioned array would have.
Also, putting the shuffling code in the constuctor seems like it could lead to problems, although I have
trouble coming up with any decent examples (creating a copy of the iterator returns a different order than
the original? Shouldn't it? If you walk half way through the first iterator, copy it to a second and finish
walking through there's a likely chance of doubles? Why would you ever do that? Will this spark more
warring over possible transitivity and symmetry issues? :o) ) Any suggestions?
- Yes, I realize I could have written compareTo slightly shorter, but this way adds... clarity. Sorta.
Btw, please forgive the extrusion. Please continue the discussion on inheritence equality, I just wanted to,
crassly, fork-exec a bit (although I guess this is a thread and not a process :/ ).
Er --
After staring at "if (v < 0) v = -v;" for a little while in my hash code, it dawned on me that
the person originally put that in so that if the integer ever overflowed to negative it would
be mapped back in to the positive space, which could possibly duplicate. Hence, this
algorithm apparently suffers from the same shortcomings of integer size as yours, in
that it halves the positive integer space, albeit without expressly denoting the supposed
max size of an integer.
Then again, I may not know what I'm talking about.
I haven't looked at the forum in a while, but I just saw your posted classes and had a few comments if you're still interested.
RC: why not make it immutable? private int r, c; no set method; once it's created the row and column can't change.
Why implement Comparable? You've chosen row-major as the natural ordering, but you could also order by column first. You could instead implement a pair of static final Compartor instances within the class. Also, pick a better name :-) RC is not too informative.
LogicSet: Declare the delegate Set as private, nobody needs to know what you're using. Also, I think you chose the member name, _ts, because you are using a TreeSet. If you change this later, that name won't make much sense so just call it mySet or something.
Couldn't you implement the Iterator just by delegating to an iterator returned from a shuffled ArrayList? like this:
public class LogicSetIterator implements Iterator {
private Iterator myIterator;
public LogicSetIterator() { // does this need to be public?
super();
myIterator = Collections.shuffle(new ArrayList(mySet)).iterator();
}
public boolean hasNext() { return myIterator.hasNext(); }
// etc.
Don't worry about the performance until you find that you have a performance problem (unless you've got some time on your hands and nothing better to do).
The methods for LogicSet don't seem orthogonal to me. Right now some change the set and others return a new set. If you decide that LogicSet should be mutable (i.e. you've got methods like addElement() that change the set), why not make all the member methods (like union, intersection, and difference) mutators? If you want to also have a method for creating a new LogicSet that is a union of two other sets, then you can declare it like this:
public static final LogicSet createUnionOf(LogicSet setA, LogicSet setB) { ... }
Also, you could add a constructor to build a LogicSet with just one element and then drop the duplicate methods like union(Object obj).
Finally, I think that LogicSets only contain instances of RC, right? So why not declare that as the parameter type for methods that take a single element? Like addElement(RC element), rather than addElement(Object obj). Take advantage of the compiler!
