Thread-safe rehash

Here is a problem that at first sounds simple, but in fact is quite the opposite.

Let's say I want to keepget(key) of a hash map unsynchronized for maximum performance (other methods may be synchronized). What would be the best way to implement the rehashing?

- Marcus Sundman

[304 byte] By [msundmana] at [2007-9-29 14:00:15]
# 1

> Let's say I want to keep get(key) of a hash map

> unsynchronized for maximum performance (other methods

> may be synchronized). What would be the best way to

> implement the rehashing?

I think the solution lies in the granularity of the rehash operation. You simply can not tolerate a read operation during a rehash. The rehash is commonly done in a write or clear operation if it is necessary. But in most cases this rehash can be postponed for some time. I would try a cooperative algorithm: the rehash is not done with write or clear, but from time to time or when the system is idle. All read ops have to be cooperatively stopped and a doRehashIfNeeded() function is called.

This way you can do a sequence of read ops with maximum performance and the hash map is synchronized at a higher level.

Of course if you want an unsynchronized read not because of the speed gain, but to allow concurrent reads, you need another (but well documented) algorithm.

polytroposa at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 2

> You simply can not tolerate a read operation during

> a rehash.

No, that is precisely what I want to do. The main data structure should therefore always be more or less consistent.

One solution is to first rehash to a temporary table with one bucket and then rehash to the final table. It's easy to rehash the elements into a table with one bucket since you can just concatenate all buckets (which could be linked lists) without making the table inconsistent at any point. (Sure, some buckets will have lots of superfluous elements in them while rehashing but a read will still find the correct element.) The problem is to rehash from this table with one bucket to a table with more buckets.

To be honest, I haven't read the java memory model specification. It might even be impossible to guarantee any kind of order of variable updates without synchronizing. (E.g. I don't even know if b=0; a=7; a=42; b=a; might at some point have the state a==42 && b==7 if read from another thread.)

> Of course if you want an unsynchronized read not

> because of the speed gain, but to allow concurrent

> reads, you need another (but well documented)

> algorithm.

Yes, do you know of any such algorithm?

- Marcus Sundman

msundmana at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 3

What you are saying is:

1. At any time, many threads may be accessing the get(key) method.

2. At any time, only one thread is accessing the other methods (such as put(...) )

How many threads are going to be accessing the get method? Do you think the synchronization is going to be the slowest part of your code? Does your application require the effort? Here is some code for testing synchronized versus unsynchronized:

// some sample test runs on my machine

// t = 1000, n = 1000000, sync = 29202, notsync = 4116

// t = 100, n = 1000000, sync = 3205, notsync = 661

public class SyncTest {

private static long st = 0;

private static int t = 100; // number of threads;

private static int count = 0;

public static void main(String[] args) throws Throwable {

Do d = new NotSync();

//Do d = new Sync();

int n = 1000000; // number of iterations

st = System.currentTimeMillis();

for (int i = 0; i < t; i++)

(new AThread(d, n)).start();

Thread.sleep(100000); // put main thread to sleep

}

public static synchronized void done() {

count++;

if (count == t) {

System.out.println("total: " + (System.currentTimeMillis() - st));

System.exit(0);

}

}

}

class AThread extends Thread {

private Do d = null;

private final int n;

public AThread(Do d, int n) {

this.d = d;

this.n = n;

}

public void run() {

for (int i = 0; i < n; i++)

d.doSomething();

SyncTest.done();

}

}

abstract class Do {

public abstract void doSomething();

}

class NotSync extends Do {

volatile boolean b = false;

public void doSomething() {

int a = 0;

if (b)

a = 1;

else

a = 0;

}

}

class Sync extends Do {

volatile boolean b = false;

public synchronized void doSomething() {

int a = 0;

if (b)

a = 1;

else

a = 0;

}

}

rkippena at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 4

> 1. At any time, many threads may be accessing the

> get(key) method.

> 2. At any time, only one thread is accessing the other

> methods (such as put(...) )

Correct.

> How many threads are going to be accessing the get

> method?

About 2-2000.

> Do you think the synchronization is going to

> be the slowest part of your code?

Not likely, although it does use hash maps A LOT, and having synchronized read access limits the scalability.

> Does your application require the effort?

Not at all, but not trying to make it perfect wouldn't be nearly as fun. ;-)

Besides, I've been thinking about this problem for a while and I'd like to see if someone finds a really good solution.

- Marcus Sundman

msundmana at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 5

> > You simply can not tolerate a read operation during

> > a rehash.

>

> No, that is precisely what I want to do. The main data

> structure should therefore always be more or less

> consistent.

I think that is precisely what you can't do :-)

A simple (and somewhat dumb) algorithm would be to create a second hash map (or substructure) as identical copy, rehash it and then switching over. This is the time when at least one reference changes. This change may be atomic but your read operation surely isn't. The algorithm you described has many change operations and with heavy load this will get inconsistent sooner or later. And with a complex sophisticated rehash procedure you will lose all the speed gain of unsynchronized read operations anyway.

> (E.g. I don't even know if

> b=0; a=7; a=42; b=a; might at some point have

> the state a==42 && b==7 if read from another

> thread.)

I don't think so. Execution order has to be source code order if the operations depend on each other or the compiler can't decide if it's the case. E.g. boolean evaluation has to be in order else it would break many programs.

> > Of course if you want an unsynchronized read not

> > because of the speed gain, but to allow concurrent

> > reads, you need another (but well documented)

> > algorithm.

>

> Yes, do you know of any such algorithm?

The idea of the algorithm is outlined [url http://www.asingh.net/technical/rwlocks.html]here[/url].

polytroposa at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 6

> A simple (and somewhat dumb) algorithm would be to

> create a second hash map (or substructure) as

> identical copy, rehash it and then switching over.

> This is the time when at least one reference changes.

> This change may be atomic but your read operation

> surely isn't.

OK, so this translates to the following code:Bucket[] table;

Object get(Object key) {

Bucket b = table[key.hashCode()%table.length];

return b.get(key);

}

synchronized rehash() {

// (1) get a copy of the table

Bucket[] new_table = getCopy(table);

// (2) resize the copy

rehash(new_table, new_table.length*2);

// (3) replace the table

table=new_table;

}

However, if my solution doesn't work then this solution doesn't work either. (3) is atomic, but if the memory model doesn't dictate the perceived order of execution on unsynchronized code then get(key) might see a state where (3) is executed but (2) is still executing.

> And with a complex

> sophisticated rehash procedure you will lose all the

> speed gain of unsynchronized read operations anyway.

That depends on the situation. If a rehash is needed once a year and read operations are performed several hundred times a second then effectiveness of the rehash procedure doesn't really matter.

> > (E.g. I don't even know if

> > b=0; a=7; a=42; b=a; might at some point

> have

> > the state a==42 && b==7 if read from another

> > thread.)

>

> I don't think so. Execution order has to be source

> code order if the operations depend on each other or

> the compiler can't decide if it's the case. E.g.

> boolean evaluation has to be in order else it would

> break many programs.

Can you point to the section of the specs that says that? And if that really is the case, then why did you think my solution wouldn't work?

> > > Of course if you want an unsynchronized read not

> > > because of the speed gain, but to allow concurrent

> > > reads, you need another (but well documented)

> > > algorithm.

> >

> > Yes, do you know of any such algorithm?

>

> The idea of the algorithm is outlined

> [url http://www.asingh.net/technical/rwlocks.html]here[/url]

Oh, I didn't realize that you were talking about a read/write lock.

However, this still requires all reads to synchronize, and if the actual lookup is fast then this algorithm is quite slow since each access (both reads and writes) will have to synchronize twice (once to aquire the lock and once to release it).

Still, this is a possible solution so here you have some D$s. :-)

- Marcus Sundman

msundmana at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 7

> To be honest, I haven't read the java memory model specification. It might even

> be impossible to guarantee any kind of order of variable updates without

> synchronizing. (E.g. I don't even know if b=0; a=7; a=42; b=a; might at some

> point have the state a==42 && b==7 if read from another thread.)

Not from what you've said - the operations of the thread executing the code you give must be in order, so it would never assign 7 to b. However, another thread could see a state a == 7 && b == 42. ([url http://java.sun.com/docs/books/vmspec/2nd-edition/html/Threads.doc.html]Memory model specification[/url], BTW).

YATArchivista at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 8

> > A simple (and somewhat dumb) algorithm would be to

> > create a second hash map (or substructure) as

> > identical copy, rehash it and then switching over.

> > This is the time when at least one reference changes.

> > This change may be atomic but your read operation

> > surely isn't.

>

> OK, so this translates to the following code:

Look at this version:

Bucket[] table;

Object get(Object key) {

// get a reference of the table

(1) Bucket b = table[key.hashCode()%table.length];

// uses reference

(3) return b.get(key);

}

synchronized rehash() {

// get a copy of the table

Bucket[] new_table = getCopy(table);

// resize the copy

rehash(new_table, new_table.length*2);

// replace the table

table=new_table;

(2) // old table reference gets probably invalid at end of function

}

The hypothetical but possible execution order is

(1) Get a Bucket reference in get()

(2) rehash function ends, making table reference invalid

(3) get() references an overwritten or garbage collected memory area

The unsynchronized sequence get ref / use ref is not atomic and can't be. For a single operation this is not likely but this depends of course on your load and the frequency of rehash operations. It's like most thread-unsafe programs which are running fine most of the time but eventually they crash: one of these nasty errors no one is ever tracking down. The only solution would be to make a complete deep copy of the table at (1).

> That depends on the situation. If a rehash is needed

> once a year and read operations are performed several

> hundred times a second then effectiveness of the

> rehash procedure doesn't really matter.

But then the exact time of the rehash doesn't matter either. So you can cooperatively stop your reads once a year to make a rehash.

> > I don't think so. Execution order has to be source

> > code order if the operations depend on each other or

> > the compiler can't decide if it's the case. E.g.

> > boolean evaluation has to be in order else it would

> > break many programs.

>

> Can you point to the section of the specs that says

> that?

Java Language Spec (v2.0, ch. 14.2) states:

A block is executed by executing each of the local variable declaration statements and other statements in order from first to last (left to right).

> Oh, I didn't realize that you were talking about a read/write lock.

> However, this still requires all reads to synchronize,

> and if the actual lookup is fast then this algorithm

> is quite slow since each access (both reads and

> writes) will have to synchronize twice (once to

> aquire the lock and once to release it).

Change the code. If you don't want synchronized read access you don't have to acquire the mutex, just change the read count. But if you don't have a read count there is no point in synchronizing write. If you happen to invent an algorithm which is data consistent with unsynchronized read/write pairs, it will also be consistent with concurrent unsynchronized write/write operations (I guess :-).

polytroposa at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 9

How about

public Class Barrier {

private volatile boolean blockRequested = false;

private BarrierNotification bn = null;

private Object mutex = new Object();

private int numThreads = 0;

private volatile int threadCount = 0;

public Barrier(int numThreads, BarrierNotification bn) {

this.numThreads = numThreads;

this.bn = bn;

}

public void requestBarrier()

{

blockRequested = true;

}

public void tryBarrier(){

if(blockRequested) {

synchoronized(mutex) {

if(blockRequested) {

threadCount++;

if(threadCount>=numThreads){

threadCount = 0;

this.blockRequested = false;

bn.barrierNotification(); // call back

mutex.notifyAll(); // wake all waiting threads

} else {

mutex.wait();

}

}

}

}

}

public void registerThread() {

synchronized(mutex) {

numThreads++;

}

}

public void unregisterThread() {

synchronized(mutex) {

numThreads--;

if(numThreads<0) numThreads = 0;

}

}

// implemented by the object you wish to notify when the barrier is

// in place

public interface BarrierNotification {

public void barrierNotification();

}

The above code is in effect a double check system but it should not

suffer the same problems as normal double checked locking. It has

its own problems though :)

Your hash table should implement the Barrier notification interface and

register itself with the Barrier class. The number of threads in use

needs to be set. Then in your get/set methods call the tryBlock

method of the Barrier before doing any work. This method should return

rapidly unless a barrier has been requested. If so the threads should

eventually block in the method. When all threads have blocked the call

back is activated allowing your rehash / insert to proceed.

Its a bit ugly as the work cannot proceed until all threads have

entered the tryBarrier method (and maybe not even then). but if your

reads are occuring very often and you don't want to block very often then

this is a reasonable solution.

matfud

duftama at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 10

Oh yes, there should be a synchronized block around the

public boolean requestBarrier(){

synchronized(mutex){

boolean oldValue = blockRequested;

blockRequested = true;

return oldValue;

}

}

otherwise it becomes possible for you to request a barrier from a

thread that is never involved in reading an writing to the Hash and never get

it. Also returning a value is important as otherwise you may request

a barrier when one is already requested. This may cuase problems.

matfud

(PS none of the code has been tested. I haven't even tried to compile

it so there are probably a few typoes and other errors in there)

duftama at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 11
Someone just posted this link to another thread. It is appropriate herethough.Concurrent HashMaps. http://www-106.ibm.com/developerworks/java/library/j-jtp08223/duftam
duftama at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 12

> > That depends on the situation. If a rehash is needed

> > once a year and read operations are performed several

> > hundred times a second then effectiveness of the

> > rehash procedure doesn't really matter.

>

> But then the exact time of the rehash doesn't matter

> either. So you can cooperatively stop your reads once

> a year to make a rehash.

Yes, and how do you think I could do that withouth having to synchronize every single read?

> > > I don't think so. Execution order has to be source

> > > code order if the operations depend on each other or

> > > the compiler can't decide if it's the case. E.g.

> > > boolean evaluation has to be in order else it would

> > > break many programs.

> >

> > Can you point to the section of the specs that says

> > that?

>

> Java Language Spec (v2.0, ch. 14.2) states:

> A block is executed by executing each of the local

> variable declaration statements and other statements

> in order from first to last (left to right).

That is in the context of one thread, so it is irrelevant in this situation. The relevant parts are in chapter 17.

> > Oh, I didn't realize that you were talking about a read/write lock.

> > However, this still requires all reads to synchronize,

> > and if the actual lookup is fast then this algorithm

> > is quite slow since each access (both reads and

> > writes) will have to synchronize twice (once to

> > aquire the lock and once to release it).

>

> Change the code. If you don't want synchronized read

> access you don't have to acquire the mutex, just

> change the read count.

Are you really suggesting I modify a shared variable without synchronizing? (Using volatile wouldn't even be enough for an operation such as count++.)

- Marcus Sundman

msundmana at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 13

> [...]

> The above code is in effect a double check system but it should

> not suffer the same problems as normal double checked locking.

Why not? Because you used a volatile variable?

By the way, using volatile is essentially the same as synchronizing when considering multithreaded efficiency.

> Then in your get/set methods call the tryBlock

> method of the Barrier before doing any work. This

> method should return

> rapidly unless a barrier has been requested.

Why would it be faster than simply sync'*** read access? The only difference seems to be that a smaller part is sync'ed, but the thing that is not sync'ed with this scheme (i.e. the bucket and element lookup) is very fast anyway (it is the thread sync'*** that is slow) so the net efficiency is probably negative.

> Its a bit ugly as the work cannot proceed until all

> threads have entered the tryBarrier method

It is indeed ugly, but it is still a bit interesting so I'll give you some D$s anyway for bringing it up. :-)

- Marcus Sundman

msundmana at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 14

Since the discussion seems to be taking another direction I'd better try to steer it back onto the desired tracks.

Let's forget about the java memory model for a while (the new JMM might be much better than the old one, but which JVMs really follow these specs anyway?). Let's just look at it from a theoretical point of view.

Let's say I have a hash map using linked lists as buckets:Entry[] map = new Entry[CAPACITY];

andclass Entry {

int key;

Object value;

Entry next;

}

Now, what is the best algorithm to rehash the table so that it maintains consistency at all times? That is, the following algorithm should always find the correct element if it exists:Object get(int key) {

Entry[] localmap = map;

for (Entry e = localmap[key%localmap.length]; e != null; e = e.next) {

if (e.key==key) return e.value;

}

return null;

}

- Marcus Sundman

msundmana at 2007-7-15 4:31:53 > top of Java-index,Other Topics,Algorithms...
# 15

> > But then the exact time of the rehash doesn't matter

> > either. So you can cooperatively stop your reads once

> > a year to make a rehash.

>

> Yes, and how do you think I could do that withouth

> having to synchronize every single read?

You are mixing things. You can synchronize to ensure that

1. only one read can occur at a time or

2. no read can be done during a write/rehash or vice versa

Case 1 must certainly be avoided, case 2 involves an atomic check for a single mutex and waiting only during a write/rehash. You don't want to tell me that only checking a mutex in almost all cases is a serious application bottleneck?

> > > > I don't think so. Execution order has to be source

> > > > code order if the operations depend on each other or

> > > > the compiler can't decide if it's the case. E.g.

> > > > boolean evaluation has to be in order else it would

> > > > break many programs.

> > >

> > > Can you point to the section of the specs that says

> > > that?

> >

> > Java Language Spec (v2.0, ch. 14.2) states:

> > A block is executed by executing each of the local

> > variable declaration statements and other statements

> > in order from first to last (left to right).

>

> That is in the context of one thread, so it is

> irrelevant in this situation. The relevant parts are

> in chapter 17.

Please reconsider your original question. If all source code statements are executed in order in every thread, your worst case that you outlined can never happen. Based on your statements up to now I thought this conclusion would be obvious to you. And reading chapter 17 you come to the same conclusion. What a strange coincidence.

> > > Oh, I didn't realize that you were talking about a read/write lock.

> > > However, this still requires all reads to synchronize,

> > > and if the actual lookup is fast then this algorithm

> > > is quite slow since each access (both reads and

> > > writes) will have to synchronize twice (once to

> > > aquire the lock and once to release it).

> >

> > Change the code. If you don't want synchronized read

> > access you don't have to acquire the mutex, just

> > change the read count.

>

> Are you really suggesting I modify a shared variable

> without synchronizing? (Using volatile wouldn't even

> be enough for an operation such as count++.)

Why not? Any access to a variable (except float and double) is atomic. I don't know where you get your wisdom from (e.g. count++ not being thread-safe), but please consider the slight possibility that you could be wrong.

polytroposa at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 16

> > > But then the exact time of the rehash doesn't matter

> > > either. So you can cooperatively stop your reads once

> > > a year to make a rehash.

> >

> > Yes, and how do you think I could do that withouth

> > having to synchronize every single read?

>

> You are mixing things.

> [...]

Perhaps I wasn't clear enough. I didn't mean that you have to synchronize the whole read operation when I said "synchronize every single read", but I meant that you need to synchronize something each time you execute the read operation.

> You don't want to

> tell me that only checking a mutex in almost

> all cases is a serious application bottleneck?

To check a mutex you need to synchronized on it, and that is slow, especially on multi-CPU boxes. The actual read operation is very fast.

> > > > > I don't think so. Execution order has to be source

> > > > > code order if the operations depend on each other or

> > > > > the compiler can't decide if it's the case. E.g.

> > > > > boolean evaluation has to be in order else it would

> > > > > break many programs.

> > > >

> > > > Can you point to the section of the specs that says

> > > > that?

> > >

> > > Java Language Spec (v2.0, ch. 14.2) states:

> > > A block is executed by executing each of the local

> > > variable declaration statements and other statements

> > > in order from first to last (left to right).

> >

> > That is in the context of one thread, so it is

> > irrelevant in this situation. The relevant parts are

> > in chapter 17.

>

> Please reconsider your original question. If all

> source code statements are executed in order in every

> thread, your worst case that you outlined can

> never happen. Based on your statements up to

> now I thought this conclusion would be obvious to you.

> And reading chapter 17 you come to the same

> conclusion.

That depends on the memory model. You do know that threads have local copies of shared variables, don't you? What the specs you quoted say is how it should seem to be executed, but internally out-of-order execution happens all the time. The JLS doesn't dictate much about what should happen when you don't synchronize critical sections. See e.g. JLS 17.11, especially: "it is at the option of the implementation whether or not to store the assigned values back to main memory".

I don't think my "worst case" scenario would happen on any current JVM, but, as I said, I don't know that. (I think the new JMM addresses some ambiguities related to this.)

> > > > Oh, I didn't realize that you were talking about a read/write lock.

> > > > However, this still requires all reads to synchronize,

> > > > and if the actual lookup is fast then this algorithm

> > > > is quite slow since each access (both reads and

> > > > writes) will have to synchronize twice (once to

> > > > aquire the lock and once to release it).

> > >

> > > Change the code. If you don't want synchronized read

> > > access you don't have to acquire the mutex, just

> > > change the read count.

> >

> > Are you really suggesting I modify a shared variable

> > without synchronizing? (Using volatile wouldn't even

> > be enough for an operation such as count++.)

>

> Why not? Any access to a variable (except float and

> double) is atomic. I don't know where you get your

> wisdom from (e.g. count++ not being thread-safe),

But count++ is not only one access to a variable. Actually count++ is almost equivalent to count=count+1. Both result in the following bytecodes:aload_0

getfield

iconst_1

iadd

putfield

> but please consider the slight possibility that you could

> be wrong.

I always consider that possibility. I'd be a fool if I didn't.

Sokrates said: "If I am wiser than other people it is because I know that I do not know anything, whereas others think they do." ;-)

- Marcus Sundman

msundmana at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 17

> > [...]

> > The above code is in effect a double check system

> but it should

> > not suffer the same problems as normal double

> checked locking.

>

> Why not? Because you used a volatile variable?

> By the way, using volatile is essentially the

> same as synchronizing when considering multithreaded

> efficiency.

No, The reason the code I posted will work is that there are no other

concurrent threads when the update to the HashTable is made. If you

notice the code in the synchronization block within the try method

explicitly waits until all registered threads "check-in". Once this

occurs there are no problems with out of order execution (the reason

normal double check locking does not work) as no other threads can

get access to the structures until the synch block is left and they

are reawakended. Exiting the synch block forces the new structure to

be visible to all threads.

The down side of the code is that all threads that could use the

data structure must call tryBarrier at least once before the barrier

comes into effect. It does not matter that some threads may not see

the new value of blockRequested (although it being marked volatile means

they will fairly promptly).

>

> > Then in your get/set methods call the tryBlock

> > method of the Barrier before doing any work. This

> > method should return

> > rapidly unless a barrier has been requested.

>

> Why would it be faster than simply sync'*** read

> access? The only difference seems to be that a smaller

> part is sync'ed, but the thing that is not sync'ed

> with this scheme (i.e. the bucket and element lookup)

> is very fast anyway (it is the thread sync'*** that is

> slow) so the net efficiency is probably negative.

The code is written on the assumption that you will be performing

many more reads than writes. The tryBarrier method allows most

reads to proceed with little more then forced read from main memory

(the volatile boolean blockRequested) therefore there is no

synchronization (which optimally is a O(logN) operation for N threads)

However when a synchronized operation is required you must wait until

all registered threads enter the tryBarrier method, after the

blockRequested variable is set to true. In this circumstance all

registered are counted into the synch block and put to sleep (call wait).

The final thread to enter the block (no other active threads now) then

is used to perform the modification to your structure via the call back.

it returns an releases the mutex allowing other threads to proceed.

> > Its a bit ugly as the work cannot proceed until all

> > threads have entered the tryBarrier method

>

> It is indeed ugly, but it is still a bit interesting

> so I'll give you some D$s anyway for bringing it up.

> :-)

>

>

> - Marcus Sundman

The other link I posted (thanks to someone on another thread) shows how

to access the hash without synchronisation (unless it is required) by

ensuring that the readers can detect an inconsistant state. It however

does require you to obtain 32 locks to rehash (for that implementation).

It requires you to obtain 1 lock to write and, in the majority of

circumstances, if requires no locks for reads.

matfud

duftama at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 18
> By the way, using volatile is essentially the> same as synchronizing when considering multithreaded> efficiency.This is, of course, utter BS. I have no idea what was going through my head when I wrote that. :-)- Marcus Sundman
msundmana at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 19

> > > [...]

> > > The above code is in effect a double check system but it should

> > > not suffer the same problems as normal double

> > > checked locking.

> >

> > Why not? Because you used a volatile variable?

>

> No, The reason the code I posted will work is that there are no other

> concurrent threads when the update to the HashTable is made. If you

> notice the code in the synchronization block within the try method

> explicitly waits until all registered threads "check-in". Once this

> occurs there are no problems with out of order execution (the reason

> normal double check locking does not work) as no other threads can

> get access to the structures until the synch block is left and they

> are reawakended. Exiting the synch block forces the new structure to

> be visible to all threads.

> The down side of the code is that all threads that could use the

> data structure must call tryBarrier at least once before the barrier

> comes into effect. It does not matter that some threads may not see

> the new value of blockRequested (although it being marked volatile

> means they will fairly promptly).

Ah, I didn't see that at first, but now after reading your explanation it is obvious. Very nice. Definitely worth a few more D$s..

> However when a synchronized operation is required you must wait until

> all registered threads enter the tryBarrier method, after the

> blockRequested variable is set to true. In this circumstance all

> registered are counted into the synch block and put to sleep

This might not be such a big problem in some cases. E.g. one might use a thread pool where threads are waiting to do short operations issued by a command queue. In those cases the thread that wants to do the rehashing can tell the the thread pool to make all threads call tryBarrier() (the idle ones immediately, the rest when they check in).

> The other link I posted (thanks to someone on another thread) shows

> how to access the hash without synchronisation (unless it is required)

Yes, I skimmed through it. It was indeed an interesting read. Thanks!

- Marcus Sundman

msundmana at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 20

Another thing I should have said is that the code is an implementation

of the cooperative locking approach suggested by polytropos in the first

reply.

It is not required that the threads call the tryBarrier method on each

read access (although there is little overhead for doing so). It is

only required that they periodically call it so they can all be forced

to block before the modification occurs. This means you can have very

slow writes/modifications but no, or little, overhead for reads.

As for this:

> Since the discussion seems to be taking another

> direction I'd better try to steer it back onto the

> desired tracks.

>

> Let's forget about the java memory model for a while

> (the new JMM might be much better than the old one,

> but which JVMs really follow these specs anyway?).

> Let's just look at it from a theoretical point of

> view.

>

> Let's say I have a hash map using linked lists as

> buckets:Entry[] map = new

> Entry[CAPACITY];

andclass Entry {

>int key;

>Object value;

>Entry next;

> }

Now, what is the best algorithm to rehash

> sh the table so that it maintains consistency at all

> times? That is, the following algorithm should always

> find the correct element if it exists:Object

> get(int key) {

>Entry[] localmap = map;

> for (Entry e = localmap[key%localmap.length]; e

> h]; e != null; e = e.next) {

> if (e.key==key) return e.value;

>}

>return null;

>}

>

> - Marcus Sundman

This can work as long as you perform your rehash in parallel.

// marked volatile so when it is updated other thread see it.

// holds the class local array reference

private volatile Entry[] map;

// syncronized so that multiple rehashes cannot occur at the

// same time (the same would be true of writes)

protected synchrnonized rehash(int newSize) {

// make new local map.

Entry[] newMap = new Entry[newSize];

// run though entries in current map and add them to the

// new map. This is only reading from the current map

// so other concurrent reads can proceed OK

Iterator it = this.iterator();

while(it.hasNext()){

//mythical method that inserts into the specified array

insertIntoMap(newMap,it.next());

}

// swap the maps to make the new one current

this.map = newMap

}

This does require that your read methods (the get method as you've

shown) make a method local copy of the array reference before trying to

read from it.

They will contiue to see the old map and it will continue to exist until

no more reads are occuring on it.

With this approach, even if one thread is in the middle of reading when

the rehash occurs, the read will succeed on the old version of the map.

Now the question is does this work with Java's memory model? Maybe, but

you would have to take care that out of order execution does not make

your new map visible to other threads before you've finished creating

it.

polytropos,

>Case 1 must certainly be avoided, case 2 involves an atomic check for

>a single mutex and waiting only during a write/rehash. You don't want

>to tell me that only checking a mutex in almost all cases is a serious

>application bottleneck?

The problem is if a thread just checks the synch block it can still

see an inconsitant state.

Thread 1 trys the mutex. Mutex is not taken so thread proceeds into read.

Thread 1 is intrrupted.

Thread 2 obtains lock and modified strucutre.

Thread 1 is woken and now half way through reading is reading a

different structure.

There are of course ways around this as suggested earlier.

for myself,

>synchronization (which optimally is a O(logN) operation for N threads)

Probably not true. I was thinking of memory barriers synchronisation not java synchronisation. Synchronisation (java sense not general) may not be

much more expensive then volatile however I do not know for sure. A

simple experiment should figure this out.

matfud

matfud

duftama at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 21

> Another thing I should have said is that the code is an implementation

> of the cooperative locking approach suggested by polytropos in the

> first reply.

So it seems. I should have paid it a bit more attention.

> > Since the discussion seems to be taking another

> > direction I'd better try to steer it back onto the

> > desired tracks.

> >

> > Let's forget about the java memory model for a while

> > (the new JMM might be much better than the old one,

> > but which JVMs really follow these specs anyway?).

> > Let's just look at it from a theoretical point of

> > view.

> >

> > Let's say I have a hash map using linked lists as

> > buckets:> >Entry[] map = new Entry[CAPACITY];

> > and> >class Entry {

> >int key;

> >Object value;

> >Entry next;

> >}

> > Now, what is the best algorithm to rehash

> > the table so that it maintains consistency at all

> > times? That is, the following algorithm should always

> > find the correct element if it exists:> >Object get(int key) {

> >Entry[] localmap = map;

> >for (Entry e = localmap[key%localmap.length]; e != null; e = e.next) {

> > if (e.key==key) return e.value;

> >}

> >return null;

> >}

> >

> > - Marcus Sundman

>

> This can work as long as you perform your rehash in

> parallel.>// marked volatile so when it is updated other thread see it.

>// holds the class local array reference

>private volatile Entry[] map;

>

>// syncronized so that multiple rehashes cannot occur at the

>// same time (the same would be true of writes)

>protected synchrnonized rehash(int newSize) {

>// make new local map.

>Entry[] newMap = new Entry[newSize];

>

>// run though entries in current map and add them to the

>// new map. This is only reading from the current map

>// so other concurrent reads can proceed OK

>Iterator it = this.iterator();

>while (it.hasNext()){

>//mythical method that inserts into the specified array

>insertIntoMap(newMap,it.next());

>}

>// swap the maps to make the new one current

>this.map = newMap

>}

The interesting bit here is insertIntoMap. The obvious thing to do is to create new instances of Entry for each element. This would obviously work. However, if you double/halve the capacity then you could probably reuse some of the existing Entrys. However, you cannot reorder (or remove elements from) the tail of a bucket before all reads starting from the old table array finishes. I.e., you can only insert elements into the linked lists, not remove them. This means that you either:

A) never change Entry.next, or

B) you have to have some way of knowing if there still is something using the old table array before you can do the needed modifications (i.e. remove the temporary Entrys you used to make the map valid both in the old table array and in the resized table array at once).

> This does require that your read methods (the get method as you've

> shown) make a method local copy of the array reference

> before trying to read from it.

The only reason for needing a local reference to it is that you have to access it twice to get the bucket: map[hash%map.length]

> Now the question is does this work with Java's memory model?

> Maybe, but you would have to take care that out of order execution

> does not make your new map visible to other threads before you've

> finished creating it.

Exactly, and this is the point where it might fall apart in practice. However, this isn't a problem in theory so let's ignore it. :-)

> polytropos,

> > Case 1 must certainly be avoided, case 2 involves an

> > atomic check for

> > a single mutex and waiting only during a

> > write/rehash. You don't want

> > to tell me that only checking a mutex in almost all

> > cases is a serious application bottleneck?

>

> The problem is if a thread just checks the synch block it can

> still see an inconsitant state. Thread 1 trys the mutex. Mutex

> is not taken so thread proceeds into read.

> Thread 1 is intrrupted.

> Thread 2 obtains lock and modified strucutre.

> Thread 1 is woken and now half way through reading is

> reading a different structure.

> There are of course ways around this as suggested earlier.

Yes, you can use a counter but then you'd have to sync both before and after reading.

> for myself,

> > synchronization (which optimally is a O(logN) operation

> > for N threads)

>

> Probably not true. I was thinking of memory barriers

> synchronisation not java synchronisation.

Umm.. is there a difference between "memory barrier sync'***" and "java sync'***"?

> Synchronisation (java sense not general) may not be

> much more expensive then volatile however I do not

> know for sure.

Sure it is more expensive. However, some processors use quite smart caching so sync'*** can be surprisingly fast.

Sylvia, are you out there? You're an expert in these matters, so perhaps you could shed some light on it?

- Marcus Sundman

msundmana at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 22

> polytropos,

> >Case 1 must certainly be avoided, case 2 involves an atomic check for

> >a single mutex and waiting only during a write/rehash. You don't want

> >to tell me that only checking a mutex in almost all cases is a serious

> >application bottleneck?

> The problem is if a thread just checks the synch block

> it can still

> see an inconsitant state.

> Thread 1 trys the mutex. Mutex is not taken so thread

> proceeds into read.

> Thread 1 is intrrupted.

> Thread 2 obtains lock and modified strucutre.

> Thread 1 is woken and now half way through reading is

> reading a

> different structure.

> There are of course ways around this as suggested

> earlier.

As you say, there are ways around it. You have shown a sophisticated way to do it. But even a dumb solution, like acquiring the mutex and waiting the maximum time a read operation can take before doing the rehash would work. To further refine, you could even do the rehash on a local copy during this wait.

polytroposa at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 23

> like acquiring the mutex and waiting the maximum time

> a read operation can take before doing the rehash

> would work.

No, no, no!! Don't even play with the idea unless you want to be dragged out to a field and shot. ;-) Unless your OS and its JVM support real-time restrictions you can never, ever rely on timing like that. There are hundreds of things that can go wrong; hibernating/suspending, heavy swapping, long stop-the-world GC, ...

> To further refine, you could even do the

> rehash on a local copy during this wait.

If you can do that, then why would you have to wait to acquire the lock before replacing the old map with the new map?

- Marcus Sundman

msundmana at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 24

>Umm.. is there a difference between "memory barrier sync'***" and "java sync'***"?

Im not even sure I ment memory barrier.

I was struggling to remember some details from a CS course in parallel

computation.

By a barrier I ment a point of control that all threads must reach before

any can continue (rather like the code I gave earlier). If I remember

correcly this is optimally an O(logN) operation.

>> Now the question is does this work with Java's memory model?

>> Maybe, but you would have to take care that out of order execution

>> does not make your new map visible to other threads before you've

>> finished creating it.

>

>Exactly, and this is the point where it might fall apart in practice. >However, this isn't a problem in theory so let's ignore it. :-)

Just to follow this up a bit, I came across this thread

http://www.cs.umd.edu/~pugh/java/memoryModel/archive/1095.html

which discusses the effects of volatile on an array reference.

It suggests, if I read it correctly, that performing the rehash in

parallel then swapping references may not be sufficient for other

threads to see new array elements. You may have to synchronise the

other threads to force the new array elements to become visible (be

reloaded) even though the new array reference will be visible due

to its volatile marking.

matfud

duftama at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 25

> > like acquiring the mutex and waiting the maximum time

> > a read operation can take before doing the rehash

> > would work.

>

> No, no, no!! Don't even play with the idea

> unless you want to be dragged out to a field and shot.

> ;-) Unless your OS and its JVM support real-time

> restrictions you can never, ever rely on timing

> like that. There are hundreds of things that can go

> wrong; hibernating/suspending, heavy swapping, long

> stop-the-world GC, ...

... and earthquakes and other natural disasters. It all depends on how you measure the time. May I say that you have a very, ehm, formal view? And now I will go and face death like a man.

> > To further refine, you could even do the

> > rehash on a local copy during this wait.

>

> If you can do that, then why would you have to wait to

> acquire the lock before replacing the old map with the

> new map?

To distribute the load more equally?

polytroposa at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 26

> ... and earthquakes and other natural disasters. It all depends on

>how you measure the time. May I say that you have a very, ehm, formal

>view? And now I will go and face death like a man.

Formality is the point. Double checked locking works the vast majority

of the time (if not always). However there is no formal proof that it

will work. In fact there is a proof that it can fail using the constraints

provided by the JLS. If marcus was not worried about the potential for

failure we would not be having this conversation.

On a hard realtime system you could guarentee that after a specified time

no other process could be in the read method. Using Java you do not have the same assurance. I've seen JVM's lock up for the best part of a min (due to being allocated too much memory on a machine with too

little. The GC thrashes disk (swap space)).

Actually that was the reason I came up with the approach I did. It does

guarentee that all methods have exited the read method. They all have to

enter it again for the update to occur. It also provides the advantage

that there are no concurrent threads that can expose an uninitialised

object due to instruction reordering (I hope). It may be possible for

threads to see an inconsistent map after the rehash though (the threads

may need to synchronize before leaving the tryBarrier method (after

blocking) to force new values to be visible.

matfud

duftama at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 27

> Just to follow this up a bit, I came across this thread

> http://www.cs.umd.edu/~pugh/java/memoryModel/archive/1095.html

> which discusses the effects of volatile on an array reference.

Very interesting discussion. Thanks!

> You may have to synchronise the other threads to force the new

> array elements to become visible (be reloaded) even though the

> new array reference will be visible due to its volatile marking.

Perhaps not. Someone in the mailing list thread suggested to have a dummy volatile variable and write to that variable after writing to the array, and read from that variable before reading from the array. If you're going to have the get() method always read from a volatile variable anyway you'd just have to write to that variable after writing to the array. Voil, no sync'*** needed!

- Marcus Sundman

msundmana at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 28

> I've seen JVM's lock up for the best part of a min

I've seen my IDE lock up for over five minutes when GC'***!

GC and virtual memory is a really bad combination. Besides, I can't understand where all the memory goes. I have "only" 256 MiB of RAM on my laptop, but I still shouldn't have to use virtual memory. I mean, I used to do 3D modeling and ray-tracing on my Amiga 1200 with 2 MiB of RAM (inkluding gfx-memory)! Go figure.

- Marcus Sundman

msundmana at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 29

I've been trying to understand the JMM (the current version; "volatile" changes in the new JMM change some things). Is this how it goes:

A perfectly valid multi-CPU system for Java would be one that has a "maximally lazy" memory system:

Each thread has its own CPU (at least conceptually).

If one thread does an unsynchronized write ("object.field = 123;"), that CPU never "flushes" that write to main memory. Actually it might flush random fields every now and then in a random order, but that isn't required.

Only writes that are executed within "synchronized(object) { object.field = 123; }" are flushed. Even then not to other CPU's, only to a "main memory".

If a thread reads a field ("local_variable = object.field;") outside synchronized, the assignment uses a value that is cached locally in the CPU. The CPU is never required to check the "main memory" for an up-to-date version. It uses the value cached in the CPU forever.

Only if a field is read within "synchronized(object)", the CPU is required to refresh its cache by reading the fields of that one object from the main memory. Even then, only from main memory; if another thread has written to that object outside synchronized, those writes are not seen (though randomly some writes may be seen).

Is that a valid implementation of the JMM? If so, what does it say about the possibility of writing an unsynchronized hash.get()...?

sjasjaa at 2007-7-19 6:44:45 > top of Java-index,Other Topics,Algorithms...
# 30

> Only writes that are executed within

> "synchronized(object) { object.field = 123; }" are

> flushed. Even then not to other CPU's, only to a

> "main memory".

Are you sure about that? I was under the impression that it is only required that the local copies of variables accessed in two blocks synchronized on the same monitor by two threads will have to be in sync on monitorenter and monitorexit. I thought the JMM dictated nothing about "main memory" when synchronizing.

> If a thread reads a field ("local_variable =

> object.field;") outside synchronized, the assignment

> uses a value that is cached locally in the CPU. The

> CPU is never required to check the "main memory" for

> an up-to-date version. It uses the value cached in

> the CPU forever.

Yes, I believe this is true.

> Only if a field is read within "synchronized(object)",

> the CPU is required to refresh its cache by reading

> the fields of that one object from the main memory.

I think this is one way of implementing it, but as far as I know the JMM specs doesn't require you to implement synchronization by flushing to/from main memory. (E.g., on clusters it can be much faster to just sync the local copies of the threads' variables.)

> Is that a valid implementation of the JMM? If so,

> what does it say about the possibility of writing an

> unsynchronized hash.get()...?

You didn't address volatile variables at all. You can get away with not synchronizing by making some variables volatile instead.

- Marcus Sundman

msundmana at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 31

> However, you cannot reorder (or remove elements from) the tail of a

> bucket before all reads starting from the old table array finishes.

> I.e., you can only insert elements into the linked lists, not remove

> them. This means that you either:

> A) never change Entry.next, or

> B) you have to have some way of knowing if there still is something

> using the old table array before you can do the needed modifications

> (i.e. remove the temporary Entrys you used to make the map valid both

> in the old table array and in the resized table array at once).

You may be able to insert into and remove from the list without problems

Reordering the list would basically require creating a new list and

creating new objects in it that correspond to the objects in the

old list.

Another method would, perhaps, be to use open address hashing which

does not require a list at all. This works by having one entry per

bucket. If a collision occurs for a hashCode you rehash using a well

defined formula to find the next possible bin. You therefore only need

to find a method of handling the array without synch'*** rather then

handling an array and a linked list without synch'***.

matfud

duftama at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 32

> You may be able to insert into and remove from the list without

> problems. Reordering the list would basically require creating

> a new list and creating new objects in it that correspond to

> the objects in the old list.

Exactly. And the question is how a good algorithm for such an operation would be like.

> Another method would, perhaps, be to use open address hashing

> which does not require a list at all. This works by having

> one entry per bucket. If a collision occurs for a hashCode you

> rehash using a well defined formula to find the next possible

> bin.

Good idea, but this will work only if you can ensure that no two hash codes are equal. In most cases this cannot be ensured.

- Marcus Sundman

msundmana at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 33

>> You may be able to insert into and remove from the list without

>> problems. Reordering the list would basically require creating

>> a new list and creating new objects in it that correspond to

>> the objects in the old list.

>

>Exactly. And the question is how a good algorithm for such an

>operation would be like.

The link about concurrent hashes shows how that author manipulates the

linked lists (appends and deletes) without synching. As long as you

use single link lists it should not be too hard (double linked lists

would be very difficult, if not impossible, to do this with).

for a list element

class ListElement{

protected volatile Object data = null;

protected volatile ListElement next = null; // not sure if volatile is need here

protected ListElement() {};

}

we have a list:

A-B-C-D

to perform the insert operation (insert E between B and C)

private static final ListElement listEnd = new ListElement();

//insert an element into the list after the "afterThis" element.

// For the example ListElement B is passed in as "afterThis parameter

protected void insertIntoList(ListElement afterThis, Object o){

ListElement e = new ListElement();

e.data = 0;

e.next = afterThis.next;

// at this point we have two entries in parallel that both have

// Element C as their next element. all current reads are using the

// old chain

synchronized(mutex) {

afterThis.next = e;

}

// now element E is after element B

}

The question is whether instruction reordering will screw up the

assignment order and break the code. The Synchonization may not

be necessary . Whatever the

case is, your reading methods must check the ListElement they are

currently reading to ensure that its next entry and data entry are not null. If the entries members are

null the read method must retry (with a synchronization). The last

element in the list has its "next" member reference the static

listElement object. The list does not allow nulls. This ensures that

any null (default initialisation) denotes an uninitialised or partially

initialised ListElement.

> Good idea, but this will work only if you can ensure that no two hash

> codes are equal. In most cases this cannot be ensured.

Just to clarify. Open address hashing works when multiple entries

have the same hashcode. If a collision is detected (two entries with

the same hashCode) the second of the entries is rehashed (reprobed) and inserted

into a different bucket (they aren't actually buckets of course as

they only ever have one entry). The method of probing has certain constraints.

It must be deterministic so that entries can be found again by following

the same procedure. It must (rather should) include all buckets in the

hashtable with no repeats. A simple reprobe method is to

take the original hashCode and add one to it. Use this (modulo array length)

to find a new bucket. If the new slot is occupied then try probing again. Rinse, wash and repeat. Ehen you have tried array.length probes without

finding an empty slot the hash table is full. (its easier to determine this

by keeping a count of the inserted elements). This not, in general, an

optimal reprobe policy but does work. (better algorithms are available)

matfud

duftama at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 34

> > Exactly. And the question is how a good algorithm for such an

> > operation would be like.

>

> The link about concurrent hashes shows how that author

> manipulates the

> linked lists (appends and deletes) without synching.

> [cut]

I'm sorry, I wasn't very clear. I meant an algorithm for doing a rehash operation with the specified constraints, i.e. without reordering any elements and without removing any element.

You probably will have to create some new elements, but it should be easy to reuse many, if not most, elements. E.g. if no bucket contain more than one element then it is obvious that the new hash map can reuse all elements. When buckets contain many elements it get harder, though.

> > Good idea, but this will work only if you can ensure that

> > no two hash codes are equal. In most cases this cannot be

> > ensured.

>

> Just to clarify. Open address hashing works when multiple entries

> have the same hashcode. If a collision is detected (two entries with

> the same hashCode) the second of the entries is rehashed (reprobed)

> and inserted into a different bucket

I'm sorry, I misunderstood you. I didn't know that was called "open address hashing" (none of the courses in data structures that I have attended has been in english, so I'm not familiar with all english terms).

- Marcus Sundman

msundmana at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 35

> You probably will have to create some new elements, but it should be

> easy to reuse many, if not most, elements. E.g. if no bucket contain

> more than one element then it is obvious that the new hash map can

> reuse all elements. When buckets contain many elements it get harder,

> though.

Im not really keen on the ieda of reusing entries. It is possible

when the list has only one entry in it to just make a new refrerence

from the new Hash to the old list.

For buckets that have more then one entry I think its significantly

easier to just create new entries in the new hash. Trying to

resue the old list elements means that at some stage you will have to

change its next-> reference to lastElement or null or the next element in

the new list. At this point any read operation occuring on the old hash table is given access to your

new partially constructed hashMap (well the tail end of the list for

one of the buckets) via this reference.

matfud

duftama at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 36

> At this point any read operation occuring on the old hash table

> is given access to your new partially constructed hashMap

Yes, but that is not a problem. The get() method will still find the correct element although it sometimes has to traverse a slightly longer list if the correct element isn't in the map at all.

E.g. if the old map was {∅, A→∅, C→∅, B→∅, ∅, D→∅} and the new map should be {A→D→∅, B→C→∅} then the rehash could transform the maps into {∅, A→D→∅, C→∅, B→C→∅, ∅, D→∅} and {A→D→∅, B→C→∅}, respectively. As you can see the restrictions are maintained and no new list elements are needed.

- Marcus Sundman

msundmana at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 37

I was thinking of the opposite situation where a dense hash is rehased

to a sparse one (the hash table is made bigger).

where

null, A->B->C->D, null, E, F->G, etc.

needs to be rehased to

null,null, A->C, null, B->D , E, null, F, G etc

if entry a is just copied to the new map where does its next refrerence

refer too? is it entry B or entry C. In one map it is B in the new

map it is C. If you change its value to point to C (which you have

to do at some point to get your new map to work properly) then any

read on the old map that happens concurrently could miss element B

altogether. B would effectivly disappear from the map until the new

map was swapped for the old.

matfud

duftama at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 38

> I was thinking of the opposite situation where a dense hash

> is rehased to a sparse one (the hash table is made bigger).

> where

> null, A->B->C->D, null, E, F->G, etc.

> needs to be rehased to

> null, null, A->C, null, B->D , E, null, F, G etc

>

> if entry a is just copied to the new map where does its next

> refrerence refer too? is it entry B or entry C. In one map

> it is B in the new map it is C.

It'd have to be B until nothing uses the old map anymore. The new map would temporarily have to look like:

null, null, A->B->C->D, null, B->C->D, E, null, F->G, G etc

Once nothing uses the old map you can safely change F->G to F and A->B->C to A->C, and after that you can safely change B->C->D to B->D.

- Marcus Sundman

msundmana at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 39

But then you are back to the problem of determining when some other

thread has finished using the old map. You cant do this easily without

sychnronization to ensure that the read has finished. By creating new

list elements you do not have this problem as any read operations that

reference the old list will always see the old list (the GC will clean it

sometime after the last read reference is dropped. Any new read operations

will see the new list.

matfud

duftama at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...
# 40
> But then you are back to the problem of determining when some> other thread has finished using the old map.Yes, indeed. :-(- Marcus Sundman
msundmana at 2007-7-19 6:44:50 > top of Java-index,Other Topics,Algorithms...