Core Dum
Hi,
I am working on 5.10 Generic_118833-24 sun4u sparc SUNW,Sun-Fire-V890 machine. Our server is Multi threaded and it is core dumping every few minutes. To debug this i have used libumem. Below is the information it gives when i run umem_status.
::umem_status
Status: ready and active
Concurrency:32
Logs:transaction=1m (inactive)
Message buffer:
umem allocator: buffer modified after being freed
modification occurred at offset 0x8 (0xdeadbeefdeadbeef replaced by 0xdeadbeeedeadbeef)
buffer=651e18 bufctl=65d608 cache: umem_alloc_56
previous transaction on buffer 651e18:
thread=4 time=T-0.007687000 slab=20e2d0 cache: umem_alloc_56
libumem.so.1'umem_cache_free+0x50
libumem.so.1'? (0xff35b3c4)
libCrun.so.1'__1c2k6Fpv_v_+0x4
SGwAPG40SFTPFileCollector'__1cJRWCStringHreplace6MIIpkcI_r0_+0x18c
SGwAPG40SFTPFileCollector'__1cOSFTPConnectionDget6MrknJRWCString_3nQFileTransfe rType_rkl_I_+0x48
SGwAPG40SFTPFileCollector'__1cNgetFileBySftp6FrnSRWTPtrSortedVector4nNFileColln ItemrknJRWCString_44ki_i_+0xab8
SGwAPG40SFTPFileCollector'__1cVSGwAPG40SFTPCollector6Fpv_0_+0x308
libc.so.1'? (0xfe0400b0)
umem: heap corruption detected
stack trace:
libumem.so.1'? (0xff35c5a8)
libumem.so.1'? (0xff35d6bc)
libumem.so.1'umem_cache_alloc+0x210
libumem.so.1'umem_alloc+0x60
libumem.so.1'malloc+0x28
libCrun.so.1'__1c2n6FI_pv_+0x2c
SGwAPG40SFTPFileCollector'__1cMRWCStringRefGgetRep6FIIpv_p0_+0x2c
SGwAPG40SFTPFileCollector'__1cJRWCString2t5B6MpkcI2I_v_+0x2c
SGwAPG40SFTPFileCollector'__1cNgetFileBySftp6FrnSRWTPtrSortedVector4nNFileColln ItemrknJRWCString_44ki_i_+0xc60
SGwAPG40SFTPFileCollector'__1cVSGwAPG40SFTPCollector6Fpv_0_+0x308
libc.so.1'? (0xfe0400b0)
When i check the memory it comes out as below
651e18/08X
0x651e18:deadbeefdeadbeefdeadbeeedeadbeefdeadbeefdeadbeefdeadbeef
deadbeef
Compiler and Studio Version used :
Sun Studio 11
Sun Studio 11 C Compiler
Sun Studio 11 C++ Compiler
Sun Studio 11 Tools.h++ 7.1
Can any one help me
Regards
Vikas
Message was edited by:
nagaraja
[2281 byte] By [
nagarajaa] at [2007-11-27 5:29:58]

# 1
Apparently some part of the code is writing to memory after it has been deleted. The heap corruption later causes a crash.
Try running the program under dbx with Run-TIme Checking enabled:
% dbx myprog
(dbx) check -all
(dbx) run
RTC can find many problems associated with invalid memory usage.
# 2
Thank you very much for the replay.
In our code we dont do any new or free or malloc or delete. So a memory allocated and deleted by RWCString will never be touched in our code. Correct me if i am wrong.
So now how can i find out exactly where is the problem ?
I am asking this because the problem is occuring in a runtime machine where they dont have dbx installed. So i will not be able to run dbx.
Few more informations
1) The same code with same number of threads does not dump on Solaris 9 Machine. The process is running with the same load.
The hardware configuration with respect to memory and CPU is exactly same for both Solaris 10 and Soaris 9 machine. But the only difference is the Forte version used on Solaris 9 is
Forte University Edition 6 update 2
Sun WorkShop 6 update 2 Compilers C
Sun WorkShop 6 update 2 Compilers C++
Sun WorkShop 6 update 2 Tools.h++ 7.1
2) On Solaris 10 machine if i reduce the number of threads of my process to 1 (default setting is 5) it works perfectly fine. This is the big problem. Finding out heap corruption in MT process.
3) Also if i check the information provided by libumem none of the buffers are corrupted.
# 3
I think you are running into a problem with the std::string class that shows up when *all* of the following are true:
- the compiler is older than Sun Studio 8 (C++ 5.5) and lacks recent patches
- the default or explicit compilation target architecture is SPARC v7 or v8
- the code is run on an UltraSPARC (v8plus[ab])
- the machine where the program is run has very old versions of the C++ runtime libraries
- the program is multi-threaded
First, what is the patch level of the C++ compiler that was used to build the program? Run the command
CC -V
and report the output
Next, what is the C++ runtime library on the system where the problem occurs? On that machine, run the command
showrev -p | grep SUNWlibC
and report the output.
In the best case, that machine will need to be updated with a new C++ runtime library patch (SUNWlibC).
In the worst case, you will need to patch your old compiler, or replace it with a current version, and rebuild the program. The target system might also need to get the current SUNWlibC patch.
About default architectures:
Until recently, Sun compilers compiled code by default for the old SPARC chips (v7 and v8) prior to the UltraSPARC line introduced in (I think) 1994. The reason was that Solaris still supported those old chips. Current versions of Solaris support only UltraSPARC and later (v8plus, v8plusa, v8plusb, VIS, T1, T2, etc). The default for current compilers v8plus.
If you need to recompile old code, and if you do not need to support ancient hardware, specify v8plus as the target architecture. You will not only avoid any problem with the string class, you will get better run-time performance of your code.
# 4
Thanks once again.
Below are information requested by you
1) clarence{eciknvi} % ../bin/CC -V
CC: Sun C++ 5.8 2005/10/13
The above command is run on development machine where we do our compilation.
2) showrev -p | grep SUNWlibC
Patch: 119963-05 Obsoletes: Requires: Incompatibles: Packages: SUNWlibC
Patch: 119963-08 Obsoletes: Requires: Incompatibles: Packages: SUNWlibC
The above command was run on the target machine where binary runs. How can i find out wehter the patch is correct one or not.
1) If i run file command on my exe below is what i get
file SGwAPG40SFTPFileCollector
SGwAPG40SFTPFileCollector:ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, dynamically linked, not stripped
It says V8+ required. Does that mean it has already being compiled with v8+ ? When i checked the man pages of CC it says with this compiler the default is
v8plusThis is the default and it means the
compiler uses the instruction set for
the V8plus version of the SPARC-V9 ISA.
I have got one more information. I ran the file command on librwtool.so.2. Below is what i get
file librwtool.so.2
librwtool.so.2: ELF 32-bit MSB dynamic lib SPARC Version 1, dynamically linked, not stripped, no debugging information available
This library is not of the format SPARC32PLUS. Will this create the problem ?
When i run the file command on library generated by us using the above compiler version we get the below thing with is of the format SPARC32PLUS
file libSGwcdrapnBcpFormatter.so
libSGwcdrapnBcpFormatter.so:ELF 32-bit MSB dynamic lib SPARC32PLUS Version 1, V8+ Required, dynamically linked, not stripped
I also got the dbx installed on the target machine and ran dbx as said by you. Even dbx did not report any problem regarding memory accessed by a code which was freed even after the code dumped.
# 5
In your original email you listed WorkShop 6 update 2 (C++ 5.3) as the compiler, which is why I was concerned about old compilers and old libraries. If you are actualy using Sun Studio 11 (C++ 5.8), and the target system is Solaris 10, there is no such problem. These products were published years after the std::string problem was fixed.
As you noted, the default architecture for C++ 5.8 is v8plus, and the file command shows that the program was indeed built that way.
librwtool is not an issue. It does not use the std::string class.
I think your problem must be memory corruption, and not a problem in the compiler or support libraries.
What happens when you run the program under dbx with RTC enabled?
# 6
I was able to solve the problem.
There was a Global RWCString object, which was accessed by all the threads. Threads used to assign some values to the object with out any synchronization b/w them. Since the operator= deletes the char * pointer when ever assignment operator is called. So we were ending up in accessing the memory in some thread which was getting deleted by the other thread. After synchronizing it everything is working fine.
Thank you very much for all the help.
But i am not able to answer the question how the same code was working fine on Solaris 9 machine ?
# 7
> But i am not able to answer the question how the same > code was working fine on Solaris 9 machine ?I believe it worked accidentally. Heap misuse alone introduces a lot of variations, but multithreading multiplies their number.