studio 11/x86 : dbx crashes with internal error

A beta release of FLASH 3 fully configures and compiles under the studio 11 /x86

compiler suite. The code is parallel and uses mpi for the process intercommunication,

for which we use Sun's CT7 (Clustertools 7).

The code runs fine for one and two cpus, but fails with an error for more cpus. This is

a bug in the code and we therefore would like to debug it. As a parallel debugger we

us DDT which in turn uses dbx for studio 11. The session crashes because of

dbx internal error. Using a straight dbx on a single cpu run we get the following output:

oberon:FLASH3.0_beta/object> dbx flash3

For information about new features see `help changes'

To remove this message, put `dbxenv suppress_startup_message 7.5' in your

.dbxrc

Reading flash3

Reading ld.so.1

Reading libdrfftw_mpi.so.2.0.7

Reading libdfftw_mpi.so.2.0.7

Reading libdrfftw.so.2.0.7

Reading libdfftw.so.2.0.7

Reading libhdfwrapper.so

Reading libhdf5.so.0.0.0

Reading libz.so.1

Reading libmpi.so.0.0.0

Reading libmpi_f90.so.0.0.0

Reading libmpi_f77.so.0.0.0

Reading libopen-rte.so.0.0.0

Reading libopen-pal.so.0.0.0

Reading libsocket.so.1

Reading libnsl.so.1

Reading librt.so.1

Reading libm.so.2

Reading libdl.so.1

Reading libfui.so.2

Reading libfai.so.1

Reading libfsu.so.1

Reading libsunmath.so.1

Reading libmtsk.so.1

Reading libc.so.1

Reading libaio.so.1

Reading libmd5.so.1

Reading libm.so.1

Reading libpthread.so.1

(dbx) l

36call Driver_initFlash()

37

38call Driver_evolveFlash( )

39

40call Driver_finalizeFlash ( )

41

42

43end program Flash

(dbx) stop in Driver_initFlash

dbx: warning: Can't find module symbol for 'gravity_interface' :

/data/rw12/tt/tmp/FLASH3.0_beta/object/flash3:Driver_initFlash.F90 stab #32

gravity_interface_:only;gravity_init,gravity_potentiallistofblocks

dbx: warning: Can't find module symbol for 'particles_interface' :

/data/rw12/tt/tmp/FLASH3.0_beta/object/flash3:Driver_initFlash.F90 stab #33

particles_interface_:only;particles_init

dbx: internal error: signal SIGSEGV (no mapping at the fault address)

dbx's coredump will appear in /tmp

Abort

I have a copy of the core dump and can provide that.

[2469 byte] By [lydia.hecka] at [2007-11-27 4:31:36]
# 1
Most likely a known bug 6477975. There is a dbx patch available: 121616-04 (or -05 or later).For other forum readers, there is also a SPARC version of the patch, 121023-04 or later.If the patch doesn't correct the problem, I'll ask for that core file.
David_Forda at 2007-7-12 9:41:04 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 2
The patch 121616-04 is already installed. And the crash happens with that patch.There is no higher level of that patch available and I searched the available patches.The problem is not present in the EA studio 12
lydia.hecka at 2007-7-12 9:41:04 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 3

Please file a bug using bugs.sun.com? (Or using bugtraq

if you're a Sun employee) It will help to have the executable

to try and reproduce the problem. Since this program is using

the stabs debugging format, we might also need some object files

or we might need to have the executable rebuilt with -xs so that all

the stabs get put into the executable. The stack trace from the core

file is not always enough to go on.

--chris

ChrisQuenellea at 2007-7-12 9:41:04 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 4
The bug 6477975 can be seen only on Solaris 11 (OpenSolaris, etc) -- which version of Solaris are you running on?
MaximKartasheva at 2007-7-12 9:41:04 > top of Java-index,Development Tools,Solaris and Linux Development Tools...