Memory allocation problem in Solaris 5.8
I'm porting an application into solaris, which is running fine in Linux. Its an application coded with c++ having lot of modules.. However , the program is crashing at a particular point giving segmentation fault.. I've tried a bit to debug using dbx ( I'm a newbie ) and I suspect the problem has got something to do with the over run of memory... The last few instructions that ran , which I got using 'dis' command in dbx is given below...
Segmentation fault message :
-
FEMDataSetFilterImp: Final SqlStr: ((((a.DATASET_CODE =11002)and(a.CAL_PERIOD_ID =24530050000000000000121000100140)) )) and ( ledger_id =1610)
signal SEGV (no mapping at the fault address) in FC_DStaticSpreadCalculator::SetInstrument at 0xe9e44
0x000e9e44: SetInstrument+0x00fc:st%f2, [%l0]
(dbx)
Registers
(dbx) regs
current frame: [1]
g0-g30x00000000 0xfd99a2e4 0x00000000 0x00000000
g4-g70x00000000 0x00000000 0x00000000 0x00000000
o0-o30x00000000 0x000000cc 0x0004d620 0xfdb6fa2c
o4-o70xffffff38 0x00000000 0xffbea990 0x000e9e28
l0-l30x00000000 0x0000000c 0x00000000 0x00000000
l4-l70x00000000 0x00000000 0x00000000 0x00000000
i0-i30x00729ec0 0x00386000 0xffffffe7 0x00000000
i4-i70x0072d210 0x003572c0 0xffbeaa10 0x00091b28
y0x00000000
ccr0x00000008
pc0x000e9e44:SetInstrument+0xfc st%f2, [%l0]
npc0x000e9e48:SetInstrument+0x100 st%f3, [%l0 + 4]
(dbx)
-
Last few instructions that were executed
(dbx) dis 0x000e9e00,0x000e9e44
0x000e9e00: SetInstrument+0x00b8:ld[%i0 + 44], %o0
0x000e9e04: SetInstrument+0x00bc:sll%i2, 3, %l0
0x000e9e08: SetInstrument+0x00c0:calloperator delete [PLT] ! 0x345fac
0x000e9e0c: SetInstrument+0x00c4:ld[%i0 + 48], %o0
0x000e9e10: SetInstrument+0x00c8:calloperator new [PLT]! 0x345fa0
0x000e9e14: SetInstrument+0x00cc:mov%l0, %o0
0x000e9e18: SetInstrument+0x00d0:st%o0, [%i0 + 40]
0x000e9e1c: SetInstrument+0x00d4:calloperator new [PLT]! 0x345fa0
0x000e9e20: SetInstrument+0x00d8:mov%l0, %o0
0x000e9e24: SetInstrument+0x00dc:st%o0, [%i0 + 44]
0x000e9e28: SetInstrument+0x00e0:calloperator new [PLT]! 0x345fa0
0x000e9e2c: SetInstrument+0x00e4:mov%l0, %o0
0x000e9e30: SetInstrument+0x00e8:st%o0, [%i0 + 48]
0x000e9e34: SetInstrument+0x00ec:cmp%i2, 0
0x000e9e38: SetInstrument+0x00f0:ldd[%i3 + 320], %f2
0x000e9e3c: SetInstrument+0x00f4:clr%i3
0x000e9e40: SetInstrument+0x00f8:ld[%i0 + 48], %l0
>>>>>0x000e9e44: SetInstrument+0x00fc:st%f2, [%l0]
Its giving segfault at the last instruction ..
The three calls to operator 'new ' before this all returns the memory address - ! 0x345fa0
as given by the instruction "calloperator new [PLT]! 0x345fa0"-- What does this value "0x345fa0" actually indicate... Does the repeated printing of this indicate that the heap memory has been extinguished... ?
ulimit -d , ulimit -v , ulimit all return 'unlimited' ... Could anyone throw a bit more information on what the actual trouble could be ? I Or is there anyway to confirm if it is indeed a resource/memory overrun ?
Thanks,
Ajith
# 1
The address 0x345fa0 in disassembled code like this
0x000e9e1c: SetInstrument+0x00d4: call operator new [PLT] ! 0x345fa0
indicates the address of operator new, not the value returned by operator new.
The value returned by operator new is in register %o0
If the code crashes on one system but not on another, the most likely cause is invalid code that by accident does not cause a problem on one system but does on the other.
For example, suppose you store a value through an uninitialized pointer, or one that points to an object that no longer exists. The store modifies some location in memory. By chance, that location might still be in the program's address space, but not used. No harm done. But on another system, the location might not be in the address space, or might be in use as a variable, as part of the stack frame, or as bookkeeping information for the heap. At some point probably far removed from this error, the program crashes.
Run your program under dbx will all Run-Time Checking enabled:
% dbx myprog
...
(dbx) check=all
(dbx) run
RTC will find many common types of programming errors. You can read more about it in the dbx manual.
# 3
I did realtime checking .. I enabled check -access and ran the code.
Just before crashing, it is giving out of memory exceptions.. And that too because of calls to malloc with blocksize 4GB, which I dont have in my code..See the output from DBX below..
Out of memory (oom):
Attempting to allocate a block of size 4294967096 bytes
stopped in operator new at 0xecd103d8
0xecd103d8: operator new+0x0024:callmalloc [PLT]! 0xecd5dc68
(dbx) cont
Out of memory (oom):
Attempting to allocate a block of size 4294967096 bytes
stopped in operator new at 0xecd1040c
0xecd1040c: operator new+0x0058:callmalloc [PLT]! 0xecd5dc68
(dbx) cont
Write to unallocated (wua):
Attempting to write 4 bytes through NULL pointer
stopped in FC_DStaticSpreadCalculator::SetInstrument at 0x000e9e44
0x000e9e44: SetInstrument+0x00fc:ba,a0x00571e50! 0x571e50
And the regs command is showing the same output as I've pasted in my initial post.. Now, I assume the trouble is either with 1) memory running out
2) Some erroneous values creeping into some registers. , consequently resulting in the call 'malloc' to request 4GB memory space..
Could anyone tell which register(s) hold the information about how much memory is to be issued by the call to malloc / new ? ..
In the c++ code at this point, I'm calling a function 'SetInstrument', which is a member of one class having two double values and one pointer to a structure having 5 doubles...The array in the callee program is of size 16000
.And it is initialized by the call to new before passing to the SetInstrument procedure.. So, the total size of the memory for this array thats being used is 16000* 8 * 5 = 640 KB , which is nowhere near the 4GB memory requested..
In Solaris( SunOs 5.8 ) C++, is there any limit to the heap memory that a process can own ?
showmemleaks is also showing that there is memory leak ..
Possible leaks report (possible leaks:3029 total size:2556685 bytes)
TotalNum of LeakedAllocation call stack
SizeBlocks Block
Address
========== ====== =========== =======================================
2043200600-operator new < 0xd57e4
1840007-operator new <
8747691-operator new < _vector_new_
43296984-operator new < 0x14abb4
Error limit reached. Disabling RTC until next run.
Is there anything wrong in Solaris over passing a base pointer ( which is initialized using a call to 'new' )into another class so that the base value of this address can be stored in that variable ?
And, what does 'PLT' mean in that call to malloc ?
Ajith
Message was edited by:
ajith_prasad