data race detection confusion

I've got the following code in fraggen.cc. The first section is some global variables, which ought to be initialized before main() begins.

193 // Access to this queue must always be protected by the associated mutex.

194 // The condition variable is used to wake up any process which is waiting

195 // for the queue to become non-empty.

196 queue< JobUpdateRequest *, list<JobUpdateRequest *> > job_update_queue;

197 pthread_mutex_t job_update_queue_mutex = PTHREAD_MUTEX_INITIALIZER;

198 pthread_cond_t job_update_outstanding = PTHREAD_COND_INITIALIZER;

1042 // wait on a queue of job read/refresh requests, and process them as they come in

1043 while (! halt_signal_was_received ());

1044{

1045pthread_mutex_lock (&job_update_queue_mutex);

1046while (job_update_queue.empty ())

1047 {

1048 pthread_cond_wait (&job_update_outstanding, &job_update_queue_mutex);

1049 }

1050JobUpdateRequest *job_update_request_ptr = job_update_queue.front ();

1051job_update_queue.pop ();

1052pthread_mutex_unlock (&job_update_queue_mutex);

1053process_job_update_request (job_update_request_ptr);

1054delete job_update_request_ptr;

1055}

Now I run this under the Data Race Detection Tool, and it reports:

Race #2, Vaddr: 0x1f7320

Access 1: Read, job_status_reader_thread_main + 0x000001B8,

line 1046 in "fraggen.cc"

Access 2: Write, std::_List_base<JobUpdateRequest*,std::allocator><JobUpdateRequest*> ; >::clear() + 0x0000022C,

line 76 in alternate source context "_list.c"

Total Traces: 1

Trace 1

Access 1: Read

job_status_reader_thread_main + 0x000001B8, line 1046 in "fraggen.cc"

Access 2: Write

std::_List_base<JobUpdateRequest*,std::allocator><JobUpdateRequest*> ; >::clear() + 0x0000022C, line 76 in alternate source context "_list.c"

__SLIP.FINAL__B + 0x0000003C, line 196 in "fraggen.cc"

_exithandle + 0x0000003C

exit + 0x00000004

_start + 0x00000110

There are a few issues with this:

(1) _list.c appears to be from the stlport library, which I'm using. Somewhat to my amazement, the Sun Studio documentation is very clear about which version of this library is used by Sun Studio 11: version 4.5.3. Is that the same version still in use by Studio Express?

(2) How is it that I get a race condition between a piece of code (the queue constructor) that ought to be executed before main() begins, and a piece of code (a call to the queue's empty() function) that is only executed after main() begins? If this is an example of a false positive because the constructor is not protected by a mutex, then I would suggest that DRDT be extended to special-case a recognition that such pre-main() initialization of global variables cannot cause race conditions (there being only one thread living at that point).

(3) Why is it that when I run "collect" to gather race-detection data for this program, it eventually dumps core, whereas if I run exactly the same copy of my program but not under "collect", no such core file appears?

Message was edited by:

herteg

[3267 byte] By [herteg] at [2007-11-26 10:11:56]
# 1

Studio Express comes with the same version of STLport that comes with Studio 11.

This forum is about programming in C++ in general, and about Sun C++ in particular.. Questions about other Sun Studio tools (like DRDT) are best asked in the Sun Studio Tools forum

http://forum.sun.com/jive/forum.jspa?forumID=309

I have asked one of the DRDT engineers to look at this question and your other question about the tutorial.

clamage45 at 2007-7-7 1:59:22 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 2

Herteg, thank you for reporting the problems you found.

It is difficult to tell what was happening wrt to issue (2) and issue (3).

With STLport, we do find some false positives because STLport recycles memory (which is similar to the false positive case in 6.1.2 in http://developers.sun.com/prodtech/cc/downloads/drdt/using.html). I am not sure whether it is the case here though.

For issue (3), running under DRDT requires significant amount of memory. The application may run out of memory and core-dumps if the application does not check such cases. Or it could be a DRDT. How long does your code run before it core-dumps? And how much memory it uses?

Is it possible that we can have your code for further study?

Thanks.

-- Yuan

yuan at 2007-7-7 1:59:22 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 3

With issue (2), the first section of code is a set of external variables, presumably intialized before main() begins executing. The second section of code is executed in a separate thread, spawned sometime after main() begins. I'm guessing that a small test program which includes the lines given would replicate the problem, though I don't have time to put it together right now.

The machine I tested on has a little over 1 GB memory. It did produce a huge core file (at least 100 MB, if I recall correctly), but I don't know if it approached the size of memory. Without the core dump, the memory it used should have been nominal, though I haven't measured it so far.

This code is currently under active development, and I'm only occasionally testing with DRDT. I suppose it's possible that the code was in such a shape that DRDT altered the timing in such a way that some kind of SIGSEGV ultimately got triggered, that got cut short by a programmatic termination after a few seconds when the program wasn't run under DRDT. It would run perhaps 5 seconds not under DRDT, but much longer (maybe 30 to 60 seconds?) under DRDT before it dumped core. I can't recover the version I had when I ran the test that dumped core that way. If it happens again in the future, I'll post again.

herteg at 2007-7-7 1:59:22 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 4
Thanks. Let us know once you have more information. -- Yuan
yuan at 2007-7-7 1:59:22 > top of Java-index,Development Tools,Solaris and Linux Development Tools...