2G array size limit. Any future workarounds?
Hi,
If you are one of the people working with large 3D image volumes, you know this problem. Currently, some of the analyze image volumes we deal with are in the dimension of 10,000 x 10,000 x 14, or 1024 x 1024 x 1024, and the technology is improving to give us even higher resolution scans.
Currently, most people load the 3D image data into the memory as a single array, so that viewing images in one of the three planes merely involves some simple pixelStride, rowStride and startOffset. However this approach eventually would result in overflow of array size of Java, which uses int and the maximum size is 2 billion.
Even with the 64-bit JVM, we may be able to load data that's larger than 4GB (e.g. 2G of ints), but this is becoming a problem. Certainly we can change the algorithm in terms of how we load the data, but such change isn't trivial as many algorithms already uses this approach.
The question is, how Java is going to address this problem? Thanks.
# 1
Try using multi-dimensional array.For example, instead of an array of 1000000 entries, let's say,you can break it up into 2 levels: an array of 1000 subarrays, each has 1000 numbers.
# 2
> The question is, how Java is going to address this
> problem? Thanks.
That would require a substantial change to the java language spec and as well to the java virtual machine spec.
And at least for the second the people that decide changes have been very reluctant to touch it at all and instead go with less clean solutions in the java language itself to avoid that.
Moreover what you are doing is a niche market.
So if it was me I wouldn't expect a change even in the mid-term (versus short and long term.)
You can however suggest it via the bug database as a RFE.
Or join the JCP as an active member.
Or create your own standards committee to produce a standard to support that, which, I believe, is how the java real time spec came to be.
The last is the most likely to lead to a solution and it is also likely to cost substantially more (orders of magnitude) than the others. Unless someone else is actively interested and assumes the cost.
# 3
> Try using multi-dimensional array.
>
> For example, instead of an array of 1000000 entries,
> let's say,
> you can break it up into 2 levels: an array of 1000
> subarrays, each has 1000 numbers.
As I mentioned, such changes would require changes in a lot of algorithms. Currently, startOffset + pixelStride + rowStride allows us to deal with 3D image volumes (and potentially higher dimentional) just like 2D images. If we have to use the addressing mode mentioned here, we have going to have to create many different versions of algorithm for addressing different dimensions of data, on top of 3 commonly used data types (byte, short, and int) for images. The result is a far far more complicated solution (Anyone interested in writing 9 versions of the same algorithm? Java doesn't have template to solve this issue). However, it looks like that we don't have many choices but to go to this direction.
However, I think that this 2G restriction is basically limiting Java's potential. What is the point of having 64-bit JVM when you can't easily address >4GB memory? It's like going back to the old 16-bit DOS world where we get around the 64K limit of 16-bit pointer addressing problem with a 20-bit addressing with long pointer, but still suffers from 64K byte array size limit.
# 4
Well, consider who are using 64-bit JVM? Probably only people dealing with large arrays of data. I don't really think that is advantageous to run 64-bit JVM, and even suffer from the loss of performance, unless one has specific demands in memory.
I certainly don't think that is all that impossible to add the 64-bit addressing feature. For example, currently, int[].length is a 32-bit integer. Is it possible to generate a 64-bit longLength information? In the actual JVM call to set/retrieve the value/pointer of the can be changed too to call two different functions depending on the index type.
As for joining JCP, I think that is a bit stretch for me... As you mentioned, the problem is the cost. That's why I am just posting it here, and hope to raise the awareness of the issue.
> > The question is, how Java is going to address this
> > problem? Thanks.
>
> That would require a substantial change to the java
> language spec and as well to the java virtual machine
> spec.
>
> And at least for the second the people that decide
> changes have been very reluctant to touch it at all
> and instead go with less clean solutions in the java
> language itself to avoid that.
>
> Moreover what you are doing is a niche market.
>
> So if it was me I wouldn't expect a change even in
> the mid-term (versus short and long term.)
>
> You can however suggest it via the bug database as a
> RFE.
>
> Or join the JCP as an active member.
>
> Or create your own standards committee to produce a
> standard to support that, which, I believe, is how
> the java real time spec came to be.
>
> The last is the most likely to lead to a solution and
> it is also likely to cost substantially more (orders
> of magnitude) than the others. Unless someone else
> is actively interested and assumes the cost.
# 5
> Well, consider who are using 64-bit JVM? Probably
> only people dealing with large arrays of data.
I seriously doubt it. I have seen many comments here about 64 bit VMs and yet yours is the first question that has addressed large arrays in the context media processing.
Probably the biggest users are J2EE users.
> I
> don't really think that is advantageous to run 64-bit
> JVM, and even suffer from the loss of performance,
> unless one has specific demands in memory.
>
Large memory requirements does not translate to large arrays.
> I certainly don't think that is all that impossible
> to add the 64-bit addressing feature. For example,
> currently, int[].length is a 32-bit integer. Is it
> possible to generate a 64-bit longLength information?
> In the actual JVM call to set/retrieve the
> value/pointer of the can be changed too to call two
> different functions depending on the index type.
>
I didn't say it was impossible. What I said was that even with changes that required far less work than your suggestion, those that control the VM spec have decided against it.
If it isn't that much work then creating your own mini-standard would be that much cheaper.
> As for joining JCP, I think that is a bit stretch for
> me... As you mentioned, the problem is the cost.
> That's why I am just posting it here, and hope to
> raise the awareness of the issue.
>
However as I already pointed out I believe you are talking about a niche market.
# 6
> However, I think that this 2G restriction is
> basically limiting Java's potential. What is the
> point of having 64-bit JVM when you can't easily
> address >4GB memory? It's like going back to the old
> 16-bit DOS world where we get around the 64K limit of
> 16-bit pointer addressing problem with a 20-bit
> addressing with long pointer, but still suffers from
> 64K byte array size limit.
The analogy is incorrect.
You are suggesting a single array. The limitation of 16 bit DOS was in the entire address space of the application.
It is the address space of the application where everyone (that I have seen posting here) is concerned.
The are not concerned with creating one large array, but rather being able to create 100,000 large objects rather than being able to only create 10,000.
# 7
coconut99_99: I am just writing up my report for my final year university computer science project and I make reference to your post here: http://forum.java.sun.com/thread.jspa?threadID=752830 Would it be possible for you to e-mail me ( xander@raterock.com ) and let me know your real name (surname only would be fine if you'd prefer) so that I can correctly cite you in my bibliography? Thanks, Xander.
# 8
I have 24GB of ram and need to be able to allocate arrays bigger than 2G.
I do not think it will be a very complicated thing to support 64 bit indices.
All you need to do is to have something like int [[]] instead of int[] to denote 64 bit indexable arrays. And similarly for the all primtive types and arrays of objects. have Any_Object [[]] stand for arrays with 64 bit indices. Then additional libraries can be built on top of that.
Java is a pretty decent language for developing big applications. I have a very nice jit that makes it run one for one against C/C++ in a good deal of applications and is the much nices for memory managent and coding. RMI is also very competitive against TCPIP based RPC. Plus java runs on many platforms and it is significanly less of a hassle that C/C++. However C# is also a decent language and if they support this feature many people who have to write big server applications may have to switch.