Intel Solaris 8 to 10 Migration - Static Variable C Problem
We have a complex C application (10s of millions of lines of code) last compilied under Solaris 8 using Forte. Runs on Intel platform.
Loaded Solaris 10 and started the app without recompliing. Initial results were encouraging. Much of the app seemed to be working fine but not all.
Seems to be a strange problem with static variables. In a startup module have something like
static int executeCnt;
int VDstartupGetExecuteCnt() { return executeCnt; }
executeCnt get's incremented at various points withing the startup file. But when a function calls VDstartupGetExecuteCnt() a value of 0 is always returned.
Anybody have any ideas on where the problem might be? Trying to avoid recompiling the entire app. I done a little bit of poking around with appcert but the app relies heavily on dload to dynamically load modules.
Thanks
[877 byte] By [
cerad] at [2007-11-26 7:07:15]

# 1
Run dbx on your app and use the 'when access' command to inspect
all the places there the variable is read/written.
At the point where VDstartupGetExecuteCnt() returns zero, check
the value of the variable by printing it from dbx. Use the 'examine'
(or 'x') command to look at the raw memory at the address that
is returned by:
(dbx) print & executeCnt
Those are some ideas of how to approach the problem.
# 3
> What is the code to increment that static variable?
> I assume there is a function in the same translation
> unit that performs this increment.
> You can trace it in dbx (or with dtrace) to check if
> it is ever called.
Yes the static variable does get incremented in the same file and the function doing the increment get's called.. Remember this all works fine under Solaris 8.
I'd love to use dbx but we found that we needed to make one huge so file to get dbx running usefully before. Dloading files under dbx didn't seem to work. Remember that this is a huge app and we have modules from all over the place each with a different build strategy.
Something seems to be clobbering my static variables. I'm think that maybe a structure size has changed since Solaris 8 and maybe that's overwriting stuff. Or maybe we did have a memory issue that Solaris 8 ignores. I remember we had plenty of those when upgrading from Solaris 4/6.
But it looks like I may need to roll up my sleeves and try a full recompile. Deep sigh.
Any more help would be appreciated.
cerad at 2007-7-6 15:55:15 >

# 4
If your app uses dlopen() to load libraries, then you won't be able to
set breakpoints in those libraries immediately after loading the
program. Dbx has no way of knowing what libraries will be dlopened.
You can run the program once within dbx until the program finishes,
or until all the libraries get loaded. That will cause dbx to load the
symbol tables for all your libraries. Or you can use 'loadobject -load'
command to explicitly load the symbols for shared libraries
that aren't directly linked with your program.
I would recommend using 'when access' (see "help event specification")
to track down the exact code/stack locations that are writing to that
variable.