Unsynchronized Collections
We have an rmi server that keeps data in Hashtabels.
Many treads access this server, some do not change the data and some do.
Hashtable is synchronized and therefor is tread safe, but we do not need tread safety
and data integrity.
So we thought we'l replace the Hashtables with HashMaps that is not synchronized
with hope that this will improve performance.
But the problem is that if one tread is iterating over the HashMap with an Iterator
and another thread is structurally changing the HashMap, the Iterator will throw
a 'ConcurrentModificationException' and will fail.
Using a synchronizedMap is the same as using a Hashtable and is not a good solution for us.
We would like to use HashMaps but without the Iterator's 'ConcurrentModificationException' .
Any way to overcome this problem will be appreciated.
[892 byte] By [
shlomika] at [2007-9-27 9:53:09]

One way to get around the issue is to create a Clone() of the original structure then return an Iterator on that object. If needed, you may need to override the clone() method to perform a deep copy of the objects. This way, the Iterator will belong to a unique structure and any changes to the original structure will not affect the Iterator.
Would it be possible for you to use an ArrayList in this case. I believe it's an Unsynchronized Collection. Though, I'm not sure if it would give you the same issue.
Disregard that last post. I had a brain ****. :-/
I thought about it but this server can serve hundreds of clients making thousands of requests per minute and the HashMaps can be thousands of entries, coping a HashMap for each request will cause unacceptable performance.
Have you looked in to using the synchronizeMap() method of the Collections class? http://java.sun.com/j2se/1.3/docs/api/java/util/Collections.html
1. Do some timings. Do you really think, if you are using RMI and a hashed structure (Hashtable or HashMap), that the bottleneck would be hashing, versus IPC?
2. Do you really want to forgo data integrity? As in leaving your HashMap in an inconsistent state, or losing entries? HashMap makes no claims as to how bad it blows up!
3. If the answer is yes to both 1 and 2, take a look at the source code for AbstractMap and HashMap. Unfortunately, the inner classes used in HashMap are private. The simplest way to remove the concurrent modification testing is create MyUnsafeMap.java, paste HashMap's code in there and cut out the testing code.
Perhaps want you want is for all the Map operations apart from the iterating process to be thread-safe, and iteration, while not detecting concurrent modification, to preserve its data invariants. (You may see stale data, for example.) This sounds like a good multithreading exercise!
--Nax
I disagree that you do not need thread safety and data integrity. If that was the case then you would not even be looking at the data. Sure maybe you dont mind if their are multiple readers and only 1 writer. But you indicated that their could be multiple writers. In this case you do need thread safety or else you could easily loose 1/2 your data.
Your risking an index out of bounds exception if you don't do synchronization or you don't disallow concurrent modification.
To PaulFMendler
Thanks for your replay.
The answers are yes.
This server I'm talking about in the major bottleneck in our system. we are trying
to improve it in any way we can every millisecond is important. so of course IPC is
a big part of the bottleneck but hashing is also.
And yes we are giving up data integrity, this is an application level decision.
I did the trick with copying HashMap's code,it works fine. this is currently the
best solution I have, although not a very clean one, if we upgrade to jdk1.3,
what happens? we loos the sun enhancements and bug fix to HashMap?
or we do the trick again?.
we didn't decide yet what to use.
To dnoyeB
We don't need data integriry.We don't care if one writer is changing the data
and a reader is reading the old data, this won't damage the application flow.
the next time this reader needs the same data it will get the new data and it's enough.
But I don't understand why am I risking an index out of bounds exception?
> To dnoyeB
> We don't need data integriry.We don't care if one
> writer is changing the data
> and a reader is reading the old data, this won't
> damage the application flow.
> the next time this reader needs the same data it will
> get the new data and it's enough.
>
> But I don't understand why am I risking an index out
> of bounds exception?
Since you copied the class you undoubtidly realize that iterators dont really copy the objects but use them in the array they are already in. so if the iterator starts to iterate, its already determined the total number of elements. Then you go and remove one. The iterator is not lost as to which element was removed and wether or not it has already show it. Plus the number the iterator has for a total is no longer valid but it does not know what the real number is. Thus, ArrayOutOfBoundsException.
But that wont happen because the iterators are fail-fast.
I am now running into a similar problem. I do need the synch because I do concurrent modification, but the synch is not on when an iterator is doing its thing. in other words, iterators are not part of the synchronization. I may be facing having to make my own Map class because of this If I can't find an accepable way of changing my philosophy.
>
> Perhaps want you want is for all the Map operations
> apart from the iterating process to be thread-safe,
> and iteration, while not detecting concurrent
> modification, to preserve its data invariants. (You
> may see stale data, for example.) This sounds like a
> good multithreading exercise!
>
> --Nax
Iterators don't detect concurrent modification for the fun of it :D Its to prevent them from blowing up, not to flush stale data. Remember, iterators could care less if you replace the data in the map, they only care when you change the number of elements in the Collection.
> Iterators don't detect concurrent modification for the
> fun of it :D Its to prevent them from blowing up, not
> to flush stale data. Remember, iterators could care
> less if you replace the data in the map, they only
> care when you change the number of elements in
> the Collection.
Let me rephase. Here is an example of what can go wrong if iteration (while not raising an exception or currupting the data invariant).
If a new key is put into a HashMap, the underlying array could grow, causing all the keys to be rehashed, and thus all the (key, value) pairs to be reordered, wrt iteration. If this were to occur during a naive iteration, you could visit some entries twice while skipping others!
I've only glanced at the source code for HashMap, and its inner class HashIterator. Because it uses this concurrent modification flag, they didn't consider that someone may want to iterate without synchronizing.
Never the less, one can imagine a hashed implementation of Map where the iterator, while having the sort of problem I mentioned above, does not corrupt its own or the map's data invariants. (THis may require brief synchronization blocks?) The advantage here, and this is what the original poster is going for, I think, is that you're willing to have ephemeral glitches in exchange for fewer long-term locks on the map. Another place where people tolerate this is in repainting. If a window is occasionally misdrawn for an instant, no big woop.
--Nax
> Let me rephase. Here is an example of what can go
> wrong if iteration (while not raising an exception or
> currupting the data invariant).
>
> If a new key is put into a HashMap, the underlying
> array could grow, causing all the keys to be rehashed,
> and thus all the (key, value) pairs to be reordered,
> wrt iteration. If this were to occur during a naive
> iteration, you could visit some entries twice while
> skipping others!
>
> I've only glanced at the source code for HashMap, and
> its inner class HashIterator. Because it uses this
> concurrent modification flag, they didn't consider
> that someone may want to iterate without
> synchronizing.
>
> Never the less, one can imagine a hashed
> implementation of Map where the iterator, while having
> the sort of problem I mentioned above, does not
> corrupt its own or the map's data invariants. (THis
> may require brief synchronization blocks?) The
> advantage here, and this is what the original poster
> is going for, I think, is that you're willing to have
> ephemeral glitches in exchange for fewer long-term
> locks on the map. Another place where people tolerate
> this is in repainting. If a window is occasionally
> misdrawn for an instant, no big woop.
>
> --Nax
I know what your thinking, and this thing is causing me to thing some strange things too. Like what is a synchronized collection? All Collections synchronized or not must use synchronization WRT the iterators. Remember, iterators do not have their own array of values, they use the array of the underneath Collection. So they can not work independently of the underlying Collection. You only mentioned what happens in your example if a new key is put into the hash, what if a key is removed?
Sure one can imagine the type of Collection - iterator relationship you propose if the iterator
A. copies all references into its own array and thus has its own personal non-externally modifyable set of values.
B. Is synchronized and locks the underlying array every iteration such that what it thinks is a legal index remains a legal index at least as long as the iteration requires it.
C. Is relatively unsynchronized and checks to see if the size of the array has changes since its started its iterating, and if so throws a ConcurrentModificationException.
Sun chose C.
When you said "the iterator does not corrupt its own" You seem to be assuming A. This of course would be much slower.
B. is an alternate solution that safely allows Concurrent Modification, but it also has its drawbacks in that normal operation will be slowed. The fastest solution will be C with a try catch loop on the concurrent modifications, depending on how often the modifications occur.
Yes, this is an interesting problem for distributed computing.
I just realized you said RMI...
The inherent synchronization used by the iterators IS going to cause you a lot of network traffic. The iteration routine will individually retrieve each reference from the remote Map underneath. The only thing you have locally is a reference to an iteration. You need to secure more data locally like the Map and even the objects that are Mapped themselves.
If you are going to iterate across the whole Map then I do suggest you do a deep clone and copy even the objects themselves since they will be serialized as they come across the wire anyway and you have no desire for synchronization on that level.
This should improve your speed. in otherwords do this.
1. Retrieve a deep clone() of the Map for the Client.
2. Get an iterator from your local copy of the Map( don't get the iterator from the remote Map, get it after you copy the Map else each iteration will hit the wire)
3. Safely iterate over the elements since this is the clients own personal copy of the Map. yes you can use HashMap too because this requires no synchronization.
In fairness I should note that PaulFMendler pointed out earlier that IPC will be your bottleneck, and I am agreeing except I think you have an RMI efficient usage issue too.
>Sure one can imagine the type of Collection - iterator relationship you propose if the iterator
>A. copies all references into its own array and thus has its own personal non-externally >modifyable set of values.
>B. Is synchronized and locks the underlying array every iteration such that what it thinks is a legal >index remains a legal index at least as long as the iteration requires it.
>C. Is relatively unsynchronized and checks to see if the size of the array has changes since its >started its iterating, and if so throws a ConcurrentModificationException.
I'm thinking of a 4th way. Here's some code for a simpler example (this is untested, so tolerant!)
import java.util.*;
public class SimpleCol {
private Object[] data;//data.length >= size
private int size;
public SimpleCol() {
data = new Object[8];
size = 0;
}
public synchronized Object get(int offset) {
if (offset < 0 || offset >= size)
throw new IndexOutOfBoundsException();
return data[offset];
}
public synchronized void set(int offset, Object o) {
if (offset < 0 || offset >= size)
throw new IndexOutOfBoundsException();
data[offset] = o;
}
public synchronized void add(Object o) {
if (size == data.length) {
Object[] longer = new Object[2*size];
System.arraycopy(data, 0, longer, 0, size);
data = longer;
}
data[size++] = o;
}
public synchronized Object remove(int offset) {
if (offset < 0 || offset >= size)
throw new IndexOutOfBoundsException();
Object result = data[offset];
System.arraycopy(data, offset+1, data, offset, size-offset-1);
return result;
}
private static class It implements Iterator {
private Object[] data;
private int size;
private int nextOffset;
public It(Object[] data, int size) {
this.data = data;
this.size = size;
}
public synchronized boolean hasNext() {
return nextOffset < size;
}
public synchronized Object next() {
if (!hasNext())
throw new NoSuchElementException();
return data[nextOffset++];
}
public void remove() {
throw new UnsupportedOperationException();
}
}
public synchronized Iterator iterator() {
return new It(data, size);
}
}
What's up with my iterator? It has a copy of a reference to the array initially used by the collection, so there is no "deep copying". If the collection expands to a longer array, the iterator is left holding a reference to a stale array. If the collection has items removed, iteration may visit the same item twice, because of shifting. There's no quick and dirty way to get the iterator to have a correct remove method, but I'll bet the original poster wasn't thinking of removing data via an iterator... Again, what this design is guaranteeing is that the collection is thread-safe, even if iterators talk rot.
On another point, the original poster didn't give any details on how RMI interacts with HashMap, but I assumed that the hash map interactions were implementation details within individual RMI calls: no iterators are being marshalled by RMI. At least I hope that's the case! Remote iteration would be a big mistake.
-nax
That looks like a good targeted solution. add/remove will be slow, but iteration will be just as fast without any ConcurrentModification exceptions thrown.
I did not get the impression that he was aware of the nature of his objects with respect to RMI. I think he is doing remote iteration.
> Since you copied the class you undoubtidly realize
> that iterators dont really copy the objects but use
> them in the array they are already in. so if the
> iterator starts to iterate, its already determined the
> total number of elements. Then you go and remove one.
> The iterator is not lost as to which element was
> removed and wether or not it has already show it.
> Plus the number the iterator has for a total is no
> o longer valid but it does not know what the real
> number is. Thus, ArrayOutOfBoundsException.
>
> But that wont happen because the iterators are
> fail-fast.
>
> I am now running into a similar problem. I do need
> the synch because I do concurrent modification, but
> the synch is not on when an iterator is doing its
> thing. in other words, iterators are not part of the
> synchronization. I may be facing having to make my
> own Map class because of this If I can't find an
> accepable way of changing my philosophy.
Your right.
When the underling table size changes and the Iterator
does not throw a ConcurrentModificationException:
1)
you may loos data if the table is rehashed.
lets say the the table contains three entries: A,B and C
one thread starts iterating over the map while another thread
inserts 1000 entries more.
my belief was that I will always get A,B and C with the iterator
and that was OK.
now I understand that I may loos those entries if the table is
rehashed and that's unacceptable.
2)
you may get an ArrayOutOfBoundsException if the table will shrink
on removes(something that does not happen because the table never shrinks even if you remove all the elements,look at the code).
But let's look at Hashtable.
The Enumerator returned by Hashtable.keys() works exactly like
the HashMap Iterator except the ConcurrentModificationException
I'v tested it and Hashtable Enumerator is not thread safe.
Using Hashtable Enumerator will have the same problem on
concurrent modifications.
And I thought Hashtable is thread safe.
Conclusion:not using the ConcurrentModificationException mechanism
is never good for me.
still searching for a solution.
> I just realized you said RMI...
>
> The inherent synchronization used by the iterators IS
> going to cause you a lot of network traffic. The
> iteration routine will individually retrieve each
> reference from the remote Map underneath. The only
> thing you have locally is a reference to an iteration.
> You need to secure more data locally like the Map and
> even the objects that are Mapped themselves.
>
> If you are going to iterate across the whole Map then
> I do suggest you do a deep clone and copy even the
> objects themselves since they will be serialized as
> they come across the wire anyway and you have no
> desire for synchronization on that level.
>
> This should improve your speed. in otherwords do
> this.
>
> 1. Retrieve a deep clone() of the Map for the
> Client.
> 2. Get an iterator from your local copy of the Map(
> don't get the iterator from the remote Map, get it
> after you copy the Map else each iteration will hit
> the wire)
> 3. Safely iterate over the elements since this is the
> clients own personal copy of the Map. yes you can use
> HashMap too because this requires no synchronization.
Yes, I'm using RMI but the network traffic is another subject.
All map operations are done internally in the server side
and a result is collected into vectors or strings and
returned to clients.
I'm now concentrating on the Map operations.
> What's up with my iterator? It has a copy of a
> reference to the array initially used by the
> collection, so there is no "deep copying". If the
> collection expands to a longer array, the iterator is
> left holding a reference to a stale array. If the
> collection has items removed, iteration may visit the
> same item twice, because of shifting. There's no quick
> and dirty way to get the iterator to have a correct
> remove method, but I'll bet the original poster wasn't
> thinking of removing data via an iterator... Again,
> what this design is guaranteeing is that the
> collection is thread-safe, even if iterators talk
> rot.
>
> On another point, the original poster didn't give any
> details on how RMI interacts with HashMap, but I
> assumed that the hash map interactions were
> implementation details within individual RMI calls: no
> iterators are being marshalled by RMI. At least I hope
> that's the case! Remote iteration would be a big
> mistake.
> -nax
Ok, I see your point,I think.
There is no reordering of elements. elements are added to the
array or removed from the array.
You loos les data with the iterator.
But it's still syncronized!
And what about hashing. with a hashed map you must reorder
the elements when incresing the array.
And yes your right: no iterators are being marshalled by RMI.
I will sharpen me desire.
Lets say we have a map of some kind with the items: A ,B C and D
Thread 1 start iterating over the map with an Iterator.
Thread 2 removes B and adds E, F, G, H....
in thread 1 i always want to see at least A, C and D.
With HashMap or Hashtable if the table is rehashed during the add operations of thread 2 i may loos one of those entries (A,C,D).
And I don't want the ConcurrentModificationException of the Iterator.
I think that neither Hashtable or HashMap can guaranty this.
My impression is I have to go on writing my own map.
You are correct. The HashTable and HashMap use linkedList2 and as such never throw ArrayOutOfBoundsExceptions and you should be safe with a similar implementation to that by ignoring the concurrentModificationException.
And yes, rehash only happens on add and that is growing the array so again you cant get AOBE. You are probably safest by making your own implementation else they could change the implementation in 1.5 and break your program.
I still dont understand why you think its ok to receive corrupted data. With the right timing some people will only get 1 result. you are hoping thats not most of the people, but I dont think you can assure that. Especially since your concerned about time, that means this thing will run a lot. I wouldnt offer such an implementation.
Besides, if your iterators (underlying array) are local to your clients, then why is this even an issue?
> I will sharpen me desire.
> Lets say we have a map of some kind with the items: A
> ,B C and D
> Thread 1 start iterating over the map with an
> Iterator.
> Thread 2 removes B and adds E, F, G, H....
> in thread 1 i always want to see at least A, C and D.
> With HashMap or Hashtable if the table is rehashed
> during the add operations of thread 2 i may loos one
> of those entries (A,C,D).
> And I don't want the ConcurrentModificationException
> of the Iterator.
> I think that neither Hashtable or HashMap can guaranty
> this.
>
> My impression is I have to go on writing my own map.
Take a look at LinkedHashMap. Again, it throws ConcurrentM.E.,
but if you're writing your own map, it may be a better starting
point. The linked structure eliminates the ordering problem with rehashing.
--nax
> I still dont understand why you think its ok to
> receive corrupted data. With the right timing some
> people will only get 1 result. you are hoping thats
> not most of the people, but I dont think you can
> assure that. Especially since your concerned about
> time, that means this thing will run a lot. I wouldnt
> offer such an implementation.
I will explain.
The application is a large banking system,the data we
are talking about is actually data base tables that
are loaded into memory,clients don't fetch data from the DB
but from this server that has the data in memory.
All the table are stored in the same memory structure and every change
to the DB has to be replicated to the memory structure.
The data is tables that store info about customers,banks,commission rates, deals and more. and static data that is allmost the same for all banks and setup tables for the application.
There is a huge number of read operations,sometimes it's a simple
get of one entry and sometimes it's a complicated search on a
few tables.
There is no massive change to this data and the static and
setup data allmost never changes in the application life time.
but changes do occur all day long and when they do i don't
want them to disturb the read operations.
To your question.
If the customers table contains a list of 50000 customers
and client A requests the list of all customers in the US
at the same time client B adds a new customer from US.
we have no problem that client A will not see the new customer
at this time.
As i said this is an application level decision.
>
> Besides, if your iterators (underlying array) are
> local to your clients, then why is this even an
> issue?
Iterators are never local to clients, client care less about
the type of stracture that the data is stored in. all operations are doen in the server and clients receive results in vectors or strings.
> >
> > Besides, if your iterators (underlying array) are
> > local to your clients, then why is this even an
> > issue?
>
> Iterators are never local to clients, client care less
> about
> the type of stracture that the data is stored in. all
> operations are doen in the server and clients receive
> results in vectors or strings.
>
If your iterators are not local to the clients but marshalled across through RMI from the server, then the iteration operation will take place across the network with each element returning one at a time across the network. This is what we have said is going to be incredibly slow. You are better to marshall the whole array to your client so they have a local copy of all elements, then start your iteration. And if you do that, these other issues are moot because you will have copied the underlying array and concurrent modification is then impossible.
> If your iterators are not local to the clients but
> marshalled across through RMI from the server, then
> the iteration operation will take place across the
> network with each element returning one at a time
> across the network. This is what we have said is
> going to be incredibly slow. You are better to
> marshall the whole array to your client so they have a
> local copy of all elements, then start your
> iteration. And if you do that, these other issues are
> moot because you will have copied the underlying array
> and concurrent modification is then impossible.
I think you didn't understand the scenario.
I'm talking about the server and not the client.
My clients don't use any iterators or HashMaps
all this is doene in the server side.
think of it as a stand alone application with no clients.
Then how do we get to the place where we have multiple iterators going at the same time, plus modification?
> Then how do we get to the place where we have multiple
> iterators going at the same time, plus modification?
There is a server that has data in a structure og HashMaps
clients ask for data from the server and the server
search the data by iterating over the HashMaps
some client read the data and some change it.
> > Then how do we get to the place where we have
> multiple
> > iterators going at the same time, plus
> modification?
>
> There is a server that has data in a structure og
> HashMaps
> clients ask for data from the server and the server
> search the data by iterating over the HashMaps
> some client read the data and some change it.
OK, so the server creates 1 thread per client request and in that thread it searches over the iterator to find something? like all items with "X" or some such? and only the search results go to the client?
> OK, so the server creates 1 thread per client request> and in that thread it searches over the iterator to> find something? like all items with "X" or some such?> and only the search results go to the client?Yes ,that's correct.
Well if the HashMap ever gets rehashed then you risk throwing a NullPointerException since you could ask for an element that is now null.
I don't see anyway around creating your own Collection class to handle this special functionality. Once your in charge, you can get away with doing things the normal HashMap classes would not like. Such as concurrent modification.
I had this problem with a List, but I think my solution is also applicable to Maps. What I did was have 2 lists. One called "sprites" and the other called "spritesToAdd". New values are ALWAYS added to spritesToAdd. Then, in the area where I iterate over the sprites list, I do:
transferFromSpritesQueueToSprites();
// iterate over "sprites" list
// ...
transferFromSpritesQueueToSprites();
private void transferFromSpritesQueueToSprites() {
for(;;) {
if(spritesToAdd.isEmpty()) {
break;
} else {
// remove first item in spritesToAdd and move it to sprites.
Object obj = spritesToAdd.remove(0);
sprites.add(obj);
}
}
}
Of course, this only works for me because there's a single thread which I have that is constantly iterating over the sprites list. Nothing stays in the spritesToAdd list very long.
In your situation you might have to do something like:
isIterating = true;
// iterate over "sprites" list
// ...
isIterating = false;
transferFromSpritesQueueToSprites();
When isIterating==true then new objects would be added to the "sprites" list. When it's false then new objects are added to the "spritesToAdd" list which will then be transferred over when the iteration finishes.
Of course you'll need to do appropriate synchornization, probably on a separate lock Object in order to make sure that all happens without entries being left in the list until the next iteration.
