pthread_cancel() can't stop thread blocking signals on x86

I noticed different behaviour when it comes to pthread_cancel() when the application is compiled on Solaris/X86 vs. Solaris/Sparc.

I isolated the problem using the following code. The main program creates 2 threads

1: hello

2: world

The 'world' thread tries to stop the 'hello' thread after a couple of seconds. There are 3 situations possible for the 'hello' thread:

1. does not block signals.

2. block /thread signals.

3. block /process signals.

On Solaris 10/x86 the 'world' thread can't stop the 'hello' thread in situations #2, #3.

On Solaris 10/sparc or on Linux/gcc the 'world' thread cancels the 'hello' thread without problems in all three cases.

[code]#include <stdio.h>

#include <stdlib.h>

#include <pthread.h>

#include <string.h>

#include <unistd.h>

#include <signal.h>

#include <stdarg.h>

void * world_thread(void *arg);

void * hello_thread(void *arg);

static int thprint(const char *format, ...);

static pthread_t tarray[3];

int main(int argc, char *argv[])

{

int n;

tarray[0] = pthread_self();

if ( pthread_create( &tarray[1], NULL, hello_thread, NULL ) )

{

fprintf( stderr, "pthread_create 1: %s\n", strerror( n ) );

exit( 1 );

}

if ( pthread_create( &tarray[2], NULL, world_thread, NULL ) ) {

fprintf( stderr, "pthread_create 2: %s\n", strerror( n ) );

exit( 1 );

}

thprint("main waiting for world to end");

if (n = pthread_join(tarray[2], NULL ) )

{

fprintf( stderr, "pthread_join 2: %s\n", strerror( n ) );

exit( 1 );

}

thprint("main End.");

return( 0 );

}

void * hello_thread( void *arg )

{

int i=1;

sigset_t lsigmask;

char *ptsflag;

if (sigfillset(&lsigmask) == -1)

{

thprint("Could not set sigmask");

exit (1);

}

ptsflag = getenv("PTS");

if (ptsflag)

{

if (!strcmp(ptsflag, "proc"))

{

thprint("BLOCK /process signals");

if (sigprocmask(SIG_BLOCK, &lsigmask, NULL) == -1)

{

thprint("Could not block /process signals");

exit (1);

}

}

else if (!strcmp(getenv("PTS"), "thread"))

{

thprint("BLOCK /thread signals");

if (pthread_sigmask(SIG_BLOCK, &lsigmask, NULL) == -1)

{

thprint("Could not block /thread signals");

exit (1);

}

}

}

while(i++)

{

thprint( "hello count %d", i);

sleep(2);

}

return(0);

}

void * world_thread( void *arg )

{

int n;

thprint("world is going to kill hello in 5 seconds");

sleep(5);

if (n=pthread_cancel(tarray[1]))

{

thprint("World thread %s", strerror(n));

}

thprint("world cancels hello");

if (n=pthread_cancel(tarray[1]))

{

thprint("World thread %s", strerror(n));

}

thprint("world waits for hello");

if (n = pthread_join(tarray[1], NULL ) ) {

thprint("pthread_join: %s\n", strerror(n) );

exit( 1 );

}

thprint("world ends now");

return( 0 );

}

static int thprint(const char *format, ...)

{

va_list ap;

pthread_t tid;

int tindex;

tid = pthread_self();

for (tindex=0; tindex<3; tindex++)

{

if (pthread_equal(tarray[tindex], tid)) break;

}

printf("%lu:", tindex);

va_start(ap, format);

vprintf(format, ap);

va_end(ap);

printf("\n");

}[/code]

The output of this program when SIGBLOCK is enabled (for either /process or /thread) for x86 is:

[code]$ ./ptt_x86

0:main waiting for world to end

1:BLOCK /process signals

1:hello count 2

2:world is going to kill hello in 5 seconds

1:hello count 3

1:hello count 4

2:world cancels hello

2:world waits for hello

1:hello count 5

1:hello count 6

1:hello count 7

1:hello count 8[/code]

On Solaris 10/Sparc or on Linux with gcc i can cancel a thread in any of these situations:

[code]$ ./ptt_sparc

0:main waiting for world to end

1:hello count 2

2:world is going to kill hello in 5 seconds

1:hello count 3

1:hello count 4

2:world cancels hello

2:world waits for hello

2:world ends now

0:main End.[/code]

I guess this has something to do with the pthread library implementation however i think that the behaviour should be the same. I'm not sure what the spec says, did not check it.

[4724 byte] By [pghoratiu] at [2007-11-26 9:17:12]
# 1

This was a bug in Solaris 10 FCS, both sparc and x86:

6234594 blocking SIGCANCEL prevents pthread_cancel from working

It was fixed in Solaris nevada and the fix was back-ported to

Solaris 10, Update 1. Upgrade the system to at least Update 1.

Roger Faulkner

Sun Microsystems

rogerfaulkner at 2007-7-6 23:44:40 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 2
How can I find out the Solaris bugfix release version number? What command or configuration file stores this information?I get this for [B]uname -a[/B] for the system in question:[code]SunOS solaris 5.10 Generic i86pc i386 i86pc[/code]
pghoratiu at 2007-7-6 23:44:40 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 3

There's

[code]$ cat /etc/release

Solaris 10 3/05 s10_74 X86

Copyright 2004 Sun Microsystems, Inc. All Rights Reserved.

Use is subject to license terms.

Assembled 14 December 2004[/code]

Then there's

[code]$ /usr/bin/showrev -p[/code]

that lists patch revision information.

MaximKartashev at 2007-7-6 23:44:40 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 4

I have this:

[code]$ cat /etc/release

Solaris 10 3/05 s10_74L2a X86

Copyright 2005 Sun Microsystems, Inc. All Rights Reserved.

Use is subject to license terms.

Assembled 22 January 2005[/code]

So I guess that i need at least Solaris 2006/06/01 release for a fix.

pghoratiu at 2007-7-6 23:44:40 > top of Java-index,Development Tools,Solaris and Linux Development Tools...
# 5
Yes pghoratiu, you got that right.Solaris 10 update 1 is called "Update 01/06" by Sun marketing.Here is a table for future reference: http://www.sunstudiofaq.com/mw/index.php?title=Solaris_Updates
ChrisQuenelle at 2007-7-6 23:44:40 > top of Java-index,Development Tools,Solaris and Linux Development Tools...