BUS ERROR PROBLEM

Hi everyone,

We encounter a problem with our software that coredumps from time to time, especially at high activity peaks or after having run for some time

We recently recompiled a part of this software with SunStudio 8 (CC5.5) (no changes made to the source code)

We were previously working with Sun Workshop 6 (CC5.3)

I was wondering if there was any incompatibility with binaries CC5.3 and CC5.5, because under CC5.3, it was fine. The processes never cored

My guesses are :

1) use of two different STL implementation(Roguewave for CC5.3 binaries, and libCstd.so for CC5.5 binaries)

2) the new patches applied for the runtime libraries may imply to recompile CC5.3 binaries (?)

3) incompatibilities between CC5.3 and CC5.5, the Migration Guide doesn't talk about

Here are information about my configuration :

Patch: 108434-12 Obsoletes: Requires: 109147-07 Incompatibles: Packages: SUNWlibC

Patch: 108434-14 Obsoletes: Requires: 109147-07 Incompatibles: Packages: SUNWlibC

Patch: 108434-15 Obsoletes: Requires: 109147-07 Incompatibles: Packages: SUNWlibC

Patch: 108434-18 Obsoletes: Requires: 109147-07 Incompatibles: Packages: SUNWlibC

Patch: 108435-12 Obsoletes: Requires: 108434-12 Incompatibles: Packages: SUNWlibCx

Patch: 108435-14 Obsoletes: Requires: 108434-14 Incompatibles: Packages: SUNWlibCx

Patch: 108435-15 Obsoletes: Requires: 108434-15 Incompatibles: Packages: SUNWlibCx

Patch: 108435-18 Obsoletes: Requires: 108434-17 Incompatibles: Packages: SUNWlibCx

uname -a : SunOS clssund207 5.8 Generic_117350-23 sun4u sparc SUNW,Sun-Fire-V440

Ans, Fwk layers :

CC5.5 with libCstd.so, shared libraries

TAO layer:

CC5.3 with roguewave STL, no libCstd.so, shared libraries

t@54 (l@56) terminated by signal BUS (invalid address alignment)

0xfc9c27f0: t_delete+0x0068:clr[%o1 + 8]

Current function is Fwk_VectorValue::getVector

31 toInsert=new Fwk_Vector<long>(Fwk_TypeVectorLongValue);^M

(dbx) where

current thread: t@54

[1] t_delete(0x5f988, 0xfca3c008, 0xfca427cc, 0xfca4284c, 0xfca42848, 0x0), at 0xfc9c27f0

[2] _malloc_unlocked(0x2c, 0x5234a8, 0xfca3c008, 0x30, 0x5f988, 0x0), at 0xfc9c1e80

[3] malloc(0x2c, 0x51ed8c, 0xf6202098, 0x3c2cb0, 0xf6201dc8, 0xf62020c0), at 0xfc9c1cd8

[4] operator new(0x2c, 0x51ed8c, 0x13740, 0x51ed88, 0xfdeba8d4, 0x2c), at 0xfdea71b8

=>[5] Fwk_VectorValue::getVector(type = Fwk_TypeLongValue), line 31 in "Fwk_VectorValue.cc"

[6] Fwk_TableValue::addColumn(this = 0xf6202260, colName = CLASS, type = Fwk_TypeLongValue), line 55 in "Fwk_TableValue.cc"

[7] Ans_Storage_Impl::buildIndirectionTable(this = 0x137f88, table = CLASS), line 1188 in "Ans_Storage_Impl.cc"

[8] Ans_Storage_Impl::persistSubscription(this = 0x137f88, subscription = 0x576020), line 1053 in "Ans_Storage_Impl.cc"

[9] Ans_Storage::persistSubscription(this = 0xffbed40c, subscrip = 0x576020), line 144 in "Ans_Storage.cc"

[10] Ans_RequestManager::subscribe(this = 0xffbed3a4, cid = STRUCT, rid = 409856U, subscription = 0x339010, handler = 0x5d6fe8), line 147 in "Ans_RequestManager.cc"

[11] Ans_CorbaSubscription::subscribe(this = 0x218e80, cid = STRUCT, rid = 409856, subscription = STRUCT, handler = 0x4d0e10), line 60 in "Ans_CorbaSubscription.cc"

[12] Ans_CorbaSubscription_Srv::subscribe(this = 0x226258, cid = STRUCT, rid = 409856, subscription = STRUCT, handler = 0x4d0e10), line 50 in "Ans_CorbaSubscription_Srv.cc"

[13] POA_Client_ANS::ISubscription::subscribe_skel(_tao_server_request = CLASS, _tao_object_reference = 0x226268, _tao_servant_upcall = 0xf62028e4, _ACE_CORBA_Environment_variable = CLASS), line 1618 in "clientans_skel.cc"

[14] TAO_ServantBase::synchronous_upcall_dispatch(this = 0x226260, req = CLASS, servant_upcall = 0xf62028e4, derived_this = 0x226268, _ACE_CORBA_Environment_variable = CLASS), line 225 in "Servant_Base.cpp"

[15] POA_Client_ANS::ISubscription::_dispatch(this = 0x226268, req = CLASS, servant_upcall = 0xf62028e4, _ACE_CORBA_Environment_variable = CLASS), line 1876 in "clientans_skel.cc"

[16] TAO_Default_Servant_Dispatcher::dispatch(this = 0xe4e18, servant_upcall = CLASS, req = CLASS, _ACE_CORBA_Environment_variable = CLASS), line 18 in "Default_Servant_Dispatcher.cpp"

[17] TAO_Object_Adapter::dispatch_servant(this = 0x1149a8, key = CLASS, req = CLASS, forward_to = CLASS, _ACE_CORBA_Environment_variable = CLASS), line 323 in "Object_Adapter.cpp"

[18] TAO_Object_Adapter::dispatch(this = 0x1149a8, key = CLASS, request = CLASS, forward_to = CLASS, _ACE_CORBA_Environment_variable = CLASS), line 737 in "Object_Adapter.cpp"

[19] TAO_Adapter_Registry::dispatch(this = 0xf0214, key = CLASS, request = CLASS, forward_to = CLASS, _ACE_CORBA_Environment_variable = CLASS), line 114 in "Adapter.cpp"

[20] TAO_GIOP_Message_Base::process_request(this = 0x2637c8, transport = 0x263708, cdr = CLASS, output = CLASS), line 774 in "GIOP_Message_Base.cpp"

[21] TAO_GIOP_Message_Base::process_request_message(this = 0x2637c8, transport = 0x263708, qd = 0xf62032c0), line 595 in "GIOP_Message_Base.cpp"

[22] TAO_Transport::process_parsed_messages(this = 0x263708, qd = 0xf62032c0, rh = CLASS), line 1355 in "Transport.cpp"

[23] TAO_Transport::handle_input_i(this = 0x263708, rh = CLASS, max_wait_time = (nil), _ARG4 = 0), line 887 in "Transport.cpp"

[24] TAO_IIOP_Connection_Handler::handle_input(this = 0x2705e0, _ARG2 = 28), line 349 in "IIOP_Connection_Handler.cpp"

[25] ACE_TP_Reactor::dispatch_socket_event(this = 0x10e460, dispatch_info = CLASS), line 573 in "TP_Reactor.cpp"

[26] ACE_TP_Reactor::handle_socket_events(this = 0x10e460, event_count = 0, guard = CLASS), line 375 in "TP_Reactor.cpp"

[27] ACE_TP_Reactor::dispatch_i(this = 0x10e460, max_wait_time = (nil), guard = CLASS), line 200 in "TP_Reactor.cpp"

[28] ACE_TP_Reactor::handle_events(this = 0x10e460, max_wait_time = (nil)), line 133 in "TP_Reactor.cpp"

[29] ACE_Reactor::handle_events(this = 0x106620, max_wait_time = (nil)), line 157 in "Reactor.i"

[30] TAO_ORB_Core::run(this = 0xf00d8, tv = (nil), perform_work = 0, _ARG4 = CLASS), line 1807 in "ORB_Core.cpp"

[31] CORBA_ORB::run(this = 0x10fc08, tv = (nil), _ACE_CORBA_Environment_variable = CLASS), line 249 in "ORB.cpp"

[32] CORBA_ORB::run(this = 0x10fc08, _ACE_CORBA_Environment_variable = CLASS), line 233 in "ORB.cpp"

[33] Fwk_TAOThreadPool::Worker::run(this = 0x218274), line 71 in "Fwk_TAOThreadPool.cc"

[34] Pft_ThreadImpl::run(this = 0x21bc00), line 185 in "Pft_ThreadImpl.cc"

[35] hack(args = 0x21bc00), line 15 in "Pft_ThreadImpl.cc"

Tanks for any help

Loc

[6886 byte] By [LDERa] at [2007-11-27 10:31:24]
# 1

The C++ runtime library (libCstd) is the same for all version of Sun C++ 5.0 through 5.9, except for bug fixes. The versions are binary compatible (explained below).

When you say you were using the "Rogue Wave STL" with C++ 5.3, do you mean that you purchased a library directly from Rogue Wave? We can make no promises about compatibility with libraries acquired from 3rd parties. The libCstd used with Sun C++ is based on source code from Rogue Wave, but is not a Rogue Wave product.

Sun C++ also includes Rogue Wave Tools.h++, but I don't think that's what you mean.

Binary compatibility: An object file created by an earlier version of Sun C++ can be linked into a program or library built by a later version of Sun C++. The old binaries also have to have been built on the same Solaris version or an earlier version.

The reverse is not supported: You cannot create a binary with a newer compiler and use it in a program built with an earlier compiler. You can't create a binary on a newer release of Solaris and use it on an older release.

By default, C++ 5.3 links libCstd statically, although we have always recommended linking it dynamically. If you create a library L with C++ 5.3 and link it with the static libCstd.a, L will contain pieces of the version of libCstd.a included with the compiler that built library L.

Compilers since C++ 5.3 link libCstd.so (dynamically) by default. You get the version at program run time that is on the computer running the program. That is, libCstd.so is shipped as part of Solaris, not as part of the compiler installation. I see that you have a relatively recent update of the library in patches 108434-18 and 108435-18. (The latest version is -22.)

If you have pieces of the old libCstd.a included in libraries created by C++ 5.3, you need to re-link the libraries using the dynamic libCstd.so instead. Otherwise you will have version skew in libCstd, with pieces of different library versions causing unpredictable behavior.

If you do not have any of these problems, a likely reason for the crash is an error in the program. Different compiler versions (or patch levels) generate code that is different in detail for the same source code, although equivalent in effect.

Common programming errors like using an uninitialized pointer, or a pointer to an object that has been deleted, double deletion of the same object, or buffer overrun, lead to unpredictable behavior. By accident, a program built with one compiler might wind up trashing an unimportant area of memory. When rebuilt with another compiler, the program error can wind up trashing something that matters, since code and data will have moved around, and a bad pointer or corrupted heap can be bad or corrupted in a different way.

I suggest running the program under dbx with Run-Time Checking enabled. (Enable all checks.)

% dbx myprog

(dbx) check -all

(dbx) run

RTC will tell you about reading and writing unallocated data, use of invalid pointers, double deletion and memory leaks, and other problems.

It is possible that you have run into a compiler bug, of course, but that's impossible to say without a lot more analysis.

Finally, it seems odd to upgrade to C++ 5.5, which is End Of Life. Why not upgrade to Sun Studio 11 (C++ 5.8). Studio 11 is free, is supported, and works on Solaris 8. The compilers and tools have advanced considerably in features and performance since Studio 8.

clamage45a at 2007-7-28 18:09:22 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 2

Thanks clamage45 for your answer

I'll try first to deal with the two different versions of the STL

Then I'll see what dbx 'check all' can tell me

I'll keep you informed

LDERa at 2007-7-28 18:09:22 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 3

Hi,

I recompiled the ACE/TAO layer with CC 5.3, using the same options, and now it works better. I'm still surprised as to why it works now ; it might be because of a patch applied that made those libraries unstable when linked with CC5.5...

Anyway, now we're on our way to upgrade every 3rd Party libraries and have the whole thing compiling under CC5.5

Thanks again for the help

Loic

LDERa at 2007-7-28 18:09:22 > top of Java-index,Development Tools,Solaris and Linux Development Tools...