How to determine the operand types in stack by pc
Dear everyone,
I'm very curious about the garbage collection, how can it know all the operand types which is on the stack just by the pc register.
Is there any article talking about this?
Could any jvm expert answer this question? At a specified code position(a specified location of the jvm instructures), is the stack size and operand types on the stack always the same? Or could be different.
Thanks a lot if anyone could give me a clear answer
# 1
For the interpreter, the bytecodes are required to be abstractly interpretable so that at any bci in the method it's obvious what whether a local contains an object or not. The expression stack depth is also required to be exactly computable at any bci. So you can't have a loop with keep pushing values on the stack without popping them off again. Offhand I can't think of a paper referencing this though I think the JVM spec discusses it a little, but the code that hotspot uses is in the file src/share/vm/oops/generateOopMap.cpp. GenerateOopMap is the generic abstract interpretation framework we use and src/share/vm/interpreter/oopMapCache.cpp captures the result of the interpretation for the use of the GC. The abstract interpretation is complicated because it also deals with jsr/ret and can compute whether locks properly pair or not.
Compiled code doesn't need any of that since it does it's own liveness analysis. Well that's not completely true prior to 1.6 the client compiler reused the abstract interpretation for simplicity but that's all been replaced in 1.6.
tom
# 2
For the abstract interpretation that Tom refers to, I would look at the papers on basic Java bytecode verification.
Note that the HotSpot sources are all available on the web, so Tom's reference to, .e.g., src/share/vm/oops/generateOopMap.cpp could have been a link to http://12.101.252.19/hotspot/xref/src/share/vm/oops/generateOopMap.cpp.
# 3
Thanks very much!
I saw the code, it seems jvm calcute the operand stack at each basic block, and store that information in memory. When garbage collection, jvm calculate the current stack by iterating the byte code from the beginning of the basic block. Is that right?
Question 1:
But I still have a stupid question. What is a basic block? Is that means the byte codes between all jump instructions and jump targets. I found a lot of articles talking about basic block. But none of them said what a basic block is.
Question 2:
It seems at each basic block beginning, jvm memorize the local variables, operand stacks, and monitor locks. But jvm also put the monitor locks in the stack frame. If the locks can be calculated by the bci, why put them in the stack? Is there a explanation?
# 4
For a definition of basic block try google. wikipedia has a good explanation.
The purpose of computing the monitor locks is to prove that they pair properly which the compilers want to know. We won't compile method that can't be proven to pair properly This isn't a problem for any real Java code but the bytecodes allow you to write it so we have to handle it.
The stack frame records the values but the abstract interpretation only worries about pairing so they are tracking different things.
tom
# 5
Thanks very much!
But if there are many basic blocks and many variables in a method, it will use huge memory to cache the stack informations. And if the basic blocks is too long, it will take a long time for a garbage collector to caculate the stack. There seems isn't a best solution. Is there any paper concerning about this? Or maybe it isn't a problem at all in practical?
By the way, I really think jvm put too much information in the stack. In fact, the stack can be as simple as this:
local var 0
local var 1
...
local var n
return bci<- frame pointer
last frame pointer
operand stack
As I seen, current frame pointer point at local var 0, but if we reserse all local id by (maxVar-id), we can change the frame pointer to point at the context data. And in that way, we need to change all "ret" to "ret n", that means we do not record the frame start at all, but caculate it when returns. And for each frame in the stack, we can calculate the stack, exception table, monitors and line informations just by the bci.
And we can also put the last frame informations between the parameters and local variables, just like C compilers do.
In fact we only need the exception table while throwing an exception, the line information while fill a stack trace, monitors while termating a thread, stack types while garbage collection. These things seldom happens, we don't need to put any of these informations in stack. But only caculate it by the bci. So only the return bci and last frame pointer is really needed. The stack can be much smaller and jvm might run a little fast. What do you think?
null
