Fortran: "cannot access address" or "reference through nil pointer"

Hi,

I'm experiencing a strange problem with my Fortran code. When I compile with "-g -openmp=noopt" I get strange results in a particular subroutine. Before presenting the details, I should mention the problem goes away if I remove the "-openmp=noopt" compiler option. I'm using Sun Fortran 95 8.2 Patch 121020-02 2006/04/06, on a Solaris 10/x86 machine.

My code wasn't behaving as expected, so I went into the debugger to find out what and where the problem was. I found the location of the problem and what was happening, but don't know what's causing it or how to fix it.

Specifically, I call a subroutine with a number of arguments (integers, single precision scalars, single precision arrays, character arrays) and upon entering the subroutine, I cannot access the values of the majority of the passed arguments anymore and I get error messages "cannot access address 0x3fa00000" or "reference through nil pointer". Meanwhile, the variables passed that I can still access no longer have the same values they did before the call to the subroutine.

Also, using the where command after I have called the subroutine gives me weird results. For example, most of the arguments as stated in the where command take the form "ibeg = <bad address 0x3fa00000>".

I've checked the variable assignments before the call to the subroutine and they're all as they should be. Also, there are no parallel regions around the call to this subroutine and I have confirmed that this part of the code, is in fact, serial. In addition, none of the arguments I am passing to the subroutine are calculated inside a parallel region. If it makes any difference, there is a file open when the subroutine is called.

Any help or guidance I could get in resolving this problem would be much appreciated.

Thanks in advance,

Jon

Message was edited by:

PerryBothron

[1915 byte] By [PerryBothron] at [2007-11-26 10:29:21]
# 1
It might be that you've hit a compiler bug. Can you post a reproducible testcase here? You can also file a bug through bugs.sun.com
MaximKartashev at 2007-7-7 2:35:02 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 2

The basic dificulty is that your code has to go through the optimizer

when you use -xopenmp=noopt. We tell the optimizer not to actually do any

optimizations but sitll implement the OpenMP directives, but some things still

get messed up due to bugs. Because dbx expects parameters to be at

certain stack offsets, and because the optimizer doesn't really want to put

them there, we have a bug.

I think this is bugid 6184310.

It affects parameters only. Local variables should still work as far as I know.

It only affects x86, not sparc code.

If you use "stopi at &foo" instead of "stop at foo" then dbx should stop

on the very first instruction of the function (before the prolog).

This should get you the same behavior as you see with fully

optimized code. You should be able to see the parameters as long

as don't start to step into the function itself.

A few more details: Normally dbx will implement "stop in foo" by stopping

*after* the prolog code at the top of a function. Dbx is expecting this

prolog code to dump all the register parameters onto the stack at the

same offsets that the fortran front-end assigned to them. But the prolog

code is putting them at different offsets, because the code is optimized.

Hence the problem.

ChrisQuenelle at 2007-7-7 2:35:02 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 3

I've fixed the problem. The post by ChrisQuenelle cleared up the problem with the strange behaviour with dbx, but didn't answer the question about why my code was having problems. After sticking print* statements all over the place, I managed to resolve the issue, sans dbx.

Without the -openmp=noopt switch, all variables and arrays were automatically initialised to zero, even if they were never used beyond the definition. However,

with the -openmp=noopt switch on, the unused variables and arrays were NOT being initialised to zero automatically.

One of the variables that I was not initialising manually for this particular run of the code was being passed to the subroutine with non-zero numbers and causing problems (especially since the numbers were VERY large integers instead of assumed zeros).

So my question is then: Should arrays and variables be automatically initialised to zero with the -openmp=noopt switch as they are with just the -g switch? Or was this a case of sloppy programming on my part and I should assume that I need to initialise all arrays and variables (especially if they're not used, but passed to a subroutine later on)?

Thanks in advance,

Jon

PerryBothron at 2007-7-7 2:35:02 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 4

You need to initialize all your variables before using them. Although there are circumstances under which the compiler will in practice always generate code that renders the initialization unnecessary, the Fortran standard doesn't require compilers to do that. That means that code that has been working just fine could break if you start using a different compiler (possibly including just upgrading to a newer version of the same compiler).

In the case of OpenMP, initialization matters because variables are allocated on the stack rather than in the static data area. Static data by default gets filled with zeroes when a program starts. Stack space tends to have arbitrary junk in it.

You can use the -xcheck flag to help find cases of uninitialized variables. (I don't recall the exact spelling; check the man page for details.)

igb at 2007-7-7 2:35:02 > top of Java-index,Development Tools,Solaris and Linux Development Tools...