JDK6 locks use a LOT more memory then JDK5
I'm happy user of java 5 concurrency utilities - especially read/write locks. We have a system with hundreds of thousands of objects (each protected by read/write lock) and hundreds of threads. I have tried to upgrade system to jdk6 today and to my surprise, most of the memory reported by jmap -histo was used by thread locals and locks internal objects...
As it turns out, in java 5 every lock had just a counter of readers and writers. In java 6, it seems that every lock has a separate thread local for itself - which means that there are 2 objects allocated for each lock for each thread which ever tries to touch it... In our case, memory usage has gone up by 600MB just because of that.
I have attached small test program below. Running it under jdk5 gives following results:
Memory at startup 114
After init 4214
One thread 4214
Ten threads 4216
With jdk6 it is
Memory at startup 124
After init 5398
One thread 8638
Ten threads 39450
This problem alone makes jdk6 completly unusable for us. What I'm considering is taking ReentranceReadWriteLock implementation from JDK5 and using it with rest of JDK6. There are two basic choices - either renaming it and changing our code to allocate the other class (cleanest from deployment point of view) or putting different version in bootclasspath. Will renaming the class (and moving it to different package) work correctly with jstack/deadlock detection tools, or they are expecting only JDK implementation of Lock ? Is there any code in new jdk depending on particular implementation of RRWL ?
Why this change was made btw ? Only reason I can see is to not allow threads to release read lock taken by another threads. This is a nice feature, but is it worth wasting gigabyte of heap ? How this would scale to really big number of threads ?
Test program
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.locks.*;
publicclass LockTest{
static AtomicInteger counter =new AtomicInteger(0);
static Object foreverLock =new Object();
publicstaticvoid main(String[] args)throws Exception{
dumpMemory("Memory at startup ");
final ReadWriteLock[] locks =new ReadWriteLock[50000];
for (int i =0; i < locks.length; i++ ){
locks[i] =new ReentrantReadWriteLock();
}
dumpMemory("After init ");
Runnable run =new Runnable(){
publicvoid run(){
for (int i =0; i< locks.length; i++ ){
locks[i].readLock().lock();
locks[i].readLock().unlock();
}
counter.incrementAndGet();
synchronized(foreverLock){
try{
foreverLock.wait();
}catch (InterruptedException e){
e.printStackTrace();
}
}
}
};
new Thread(run).start();
while ( counter.get() != 1 ){
Thread.sleep(1000);
}
dumpMemory("One thread ");
for (int i =0; i < 9; i++ ){
new Thread(run).start();
}
while ( counter.get() != 10 ){
Thread.sleep(1000);
}
dumpMemory("Ten threads ");
System.exit(0);
}
privatestaticvoid dumpMemory(String txt ){
System.gc();
System.gc();
System.gc();
System.out.println(txt + (Runtime.getRuntime().totalMemory()-Runtime.getRuntime().freeMemory())/1024);
}
}
[5329 byte] By [
abiesa] at [2007-11-26 12:21:27]

# 2
We have read/write lock for every business object in the system. We use the functionality quite heavily - some objects can be accessed by tens of threads concurrently and need to preserve full state for the time of the access, while updates to such objects are not frequent, but still possible (ratio read:write can be like 1000:1 or more). More or less you can think about our system as distributed realtime database system - and we need to implement full locking scheme to allow seeing consistent of all objects versus transactions for a short time.
At the same time, we have a thread pools of few hundred threads to process concurrent transactions - with most of these transactions reading tens/hundreds of objects and updating one or two of them.
In production environment we are handling up to 250.000 such entities at the same time. We have an eviction system based on weak references + ref counting for distributed access, so we are tracking only the objects which are currently used (total number of entities in the system is virtually unlimited, as we have also entities representing answers to a free formed formulas) - and the figure of quarter million is only about concurrently referenced objects. On the other hand, we have fixed size thread pool, which we are underutilizing on average, but we need it to handle peak situations when every thread can perform some blocking operations.
We will probably go on with renaming route, because we also have parts of the same code working on webstart gui clients, in which it could be a major pain to play bootclasspath games.
# 5
> Controlling access/update to data is what DBMS are
> all about.
And our framework is more or less DBMS.
Imagine that you need a SQL database with following extensions:
If any row you have ever requested is modified, you should get a new version transparently plus get notified about the change (what fields have changed)
If any query you have ever done would return different rows then previously, the result collection should be modified and you should be notified about the change (delta to previous contents).
It is distributed-cache-meets-DBMS framework.
Some of the entities are backed by actual database for persistence, but others are not (they are in transient memory only, or views to data managed by completly different systems).
We could stay with R/W locks for the lists and plain locks for objects - but even the number of lists in the system (5-10k) could already have some effect when multiplied by the number of threads - and originally the cost for having R/W lock per object was relatively small and it seems cleaner and more scalable.
Just from top of my head I can give the example where I was searching the list of the objects for the index to insert a new one in write lock, but I have switched to searching this list in read lock, then changing to a write lock and searching area around previously found place (as list could be modified in the moment lock is upgraded, but in most cases I have to search only 1-2 indices around). This change had _incredible_ perceived performance impact (as rendering code for a JTable was using model based on the same list with a readlock). For single object locking it is not so obvious, but still there are objects which can locked for reading from many threads concurrently.
# 6
> I wonder how you can get good performance our of such
> a system, not talking about scaliability on
> multi-processor systems.
All this R/W lock mess is exactly because of multi-processor systems. Locking has some cost, but in java it is very fast and allows us to scale up with number of cpus.
Of course, solution to real scalability is horizontal scaling - but full synchronization over network for everything would be already a killer cost. So currently we are fully scalable vertically, while scaling horizontally on certain predefined bussiness bounds.
As far pooling locks is concerned, it is not a solution - we have reasonably constant amount of objects which are staying alife during a day. Each of them has it's lock and we are prepared to pay this cost - we are just suprised by having to pay extra cost for each thread accessing it. Situation is a bit similar to having a new JDK suddenly creating and storing new entry in HashMap every time you call a get() and you realising that memory taken by a map is growing with number of accesses, not with number of elements stored...