studio12/dbx: access checking broken for setlocate() call in "de" locale
It seems that access checking crashes the dbx target
when it is running in the "de" locale and a call to setlocale()
is used.
% dbx -V
Sun Dbx Debugger 7.6 SunOS_i386 2007/05/03
(running on opensolaris x86 / build 68)
% locale
LANG=de_DE.ISO8859-1
LC_CTYPE=de_DE.ISO8859-1
LC_NUMERIC=de_DE.ISO8859-1
LC_TIME=de_DE.ISO8859-1
LC_COLLATE=de_DE.ISO8859-1
LC_MONETARY=de_DE.ISO8859-1
LC_MESSAGES=de_DE.ISO8859-1
LC_ALL=de_DE.ISO8859-1
% cat hello.c
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
void
func(void)
{
int i;
printf("hello, world!\n%d\n", i);
}
int
main(int argc, char **argv)
{
char x[32768];
long *p = malloc(32768);
int i;
long s;
setlocale(LC_ALL, "");
for (i = 0; i < 32768 / sizeof(long); i++)
s += p[i];
printf("%ld\n", s);
func();
exit(1);
}
% cc -g -o hello hello.c
% bcheck -access hello
Reading hello
Reading ld.so.1
Reading rtcapihook.so
Reading libc.so.1
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading libgen.so.1
Reading libm.so.2
Reading rtcboot.so
Reading librtc.so
access checking - ON
Running: hello
(process id 1662)
RTC: Enabling Error Checking...
RTC: Running program...
Reading disasm.so
Reading de_DE.ISO8859-1.so.3
terminating signal 11 SIGSEGV
% dbx -C hello
Reading hello
Reading ld.so.1
Reading rtcapihook.so
Reading libc.so.1
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading libgen.so.1
Reading libm.so.2
Reading rtcboot.so
Reading librtc.so
(dbx) check -all
access checking - ON
memuse checking - ON
(dbx) run
Running: hello
(process id 1668)
RTC: Enabling Error Checking...
RTC: Running program...
Reading disasm.so
Reading de_DE.ISO8859-1.so.3
terminating signal 11 SIGSEGV
(dbx) where
dbx: program is not active
There's a new core dump in the current
directory:
% pstack /tmp/core
core '/tmp/core' of 1952:/tmp/hello
ee188106 _clear_internal_mbstate (803ed8c, 803ec28, ee1c0837, 803ed8c, ee1f5000, f) + 21
ee1c067f __charmap_init (803ed8c, ee1f5000, f, 10, 803ef38, ee1bca1f) + 27
ee1c0837 __locale_init (803ed8c) + 27
ee1bca1f setlocale (6, 8050b74, 0, 0, 8060d48, 0) + 9ff
08050a9a main(1, 8046f8c, 8046f94, ee1fa540, 8046f80, 805095f) + 2a
080509bd _start(1, 8047134, 0, 804713f, 804715a, 80471b8) + 7d
% pflags /tmp/core
core '/tmp/core' of 1952:/tmp/hello
data model = _ILP32 flags = RLC|BPTADJ|MSACCT|MSFORK
flttrace = 0x00000004
sigtrace = 0xfffffeff 0xffffffff
HUP|INT|QUIT|ILL|TRAP|ABRT|EMT|FPE|BUS|SEGV|SYS|PIPE|ALRM|TERM|USR1|USR2|CLD|PW R|WINCH|URG|POLL|STOP|TSTP|CONT|TTIN|TTOU|VTALRM|PROF|XCPU|XFSZ|WAITING|LWP|FREE ZE|THAW|CANCEL|LOST|XRES|JVM1|JVM2|RTMIN|RTMIN+1|RTMIN+2|RTMIN+3|RTMAX-3|RTMAX-2 |RTMAX-1|RTMAX
entryset = 0x00000403 0x04000000 0x00000000 0x00400000
0x80004000 0x00000000 0x00000000 0x00000000
exitset = 0x00000002 0x00000000 0x00000000 0x00400000
0x40004000 0x00000000 0x00000000 0x00000000
/1:flags = 0
sigmask = 0xfffffeff,0x0000ffff cursig = SIGSEGV
Workaround: run access checking in the "C" locale.
# 1
I can reproduce this using SS12 FCS, but latest build doesn't have this problem so the fix should be in first SS12 patch for dbx. It should be available in July, if I'm not mistaken.
# 2
> I can reproduce this using SS12 FCS, but latest build
> doesn't have this problem so the fix should be in
> first SS12 patch for dbx. It should be available in
> July, if I'm not mistaken.
Fine.
Here's another access check crash - maybe with the same root cause:
% cat langinfo.c
#include <langinfo.h>
int
main(int argc, char **argv)
{
char *lang = nl_langinfo(CODESET);
}
% cc -g -o langinfo langinfo.c
% bcheck -access langinfo
Reading langinfo
Reading ld.so.1
Reading rtcapihook.so
Reading libc.so.1
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading libgen.so.1
Reading libm.so.2
Reading rtcboot.so
Reading librtc.so
access checking - ON
Running: langinfo
(process id 2575)
RTC: Enabling Error Checking...
RTC: Running program...
Reading disasm.so
terminating signal 11 SIGSEGV
% pstack /tmp/core
core '/tmp/core' of 2575:/tmp/langinfo
ee1abfef pthread_key_create_once_np (ee1f78c8, ee16bae0) + 2f
ee16bb7e tsdalloc (3, 80, 0, 8046f50, ee1f5000, feffcd68) + 37
ee1c0e0a __nl_langinfo_std (ee1face0, 31, feffa7d0, 8046f68, 80509cf, 31) + 22
ee1cb513 nl_langinfo (31, 0, 8046f50, 80470d8, 8046f88, 805092d) + 27
080509cf main(1, 8046f94, 8046f9c, 80508cf, 8050a08, fefd3a40) + f
0805092d _start(1, 8047140, 0, 804714e, 804715b, 80471b9) + 7d
% pflags /tmp/core
core '/tmp/core' of 2575:/tmp/langinfo
data model = _ILP32 flags = RLC|BPTADJ|MSACCT|MSFORK
flttrace = 0x00000004
sigtrace = 0xfffffeff 0xffffffff
HUP|INT|QUIT|ILL|TRAP|ABRT|EMT|FPE|BUS|SEGV|SYS|PIPE|ALRM|TERM|USR1|USR2|CLD|PW R|WINCH|URG|POLL|STOP|TSTP|CONT|TTIN|TTOU|VTALRM|PROF|XCPU|XFSZ|WAITING|LWP|FREE ZE|THAW|CANCEL|LOST|XRES|JVM1|JVM2|RTMIN|RTMIN+1|RTMIN+2|RTMIN+3|RTMAX-3|RTMAX-2 |RTMAX-1|RTMAX
entryset = 0x00000403 0x04000000 0x00000000 0x00400000
0x80004000 0x00000000 0x00000000 0x00000000
exitset = 0x00000002 0x00000000 0x00000000 0x00400000
0x40004000 0x00000000 0x00000000 0x00000000
/1:flags = 0
sigmask = 0xfffffeff,0x0000ffff cursig = SIGSEGV
Unfortunately, this time I've not yet found a workaround.
# 3
> Here's another access check crash - maybe with the
> same root cause:
And another one, with getexecname(3C):
% cat execname.c
#include <stdlib.h>
int
main(int argc, char **argv)
{
const char *exe_nm = getexecname();
}
% cc -g -o execname execname.c
% bcheck -access ./execname
Reading execname
Reading ld.so.1
Reading rtcapihook.so
Reading libc.so.1
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading libgen.so.1
Reading libm.so.2
Reading rtcboot.so
Reading librtc.so
access checking - ON
Running: execname
(process id 2665)
RTC: Enabling Error Checking...
RTC: Running program...
Reading disasm.so
terminating signal 11 SIGSEGV
% pstack /tmp/core
core '/tmp/core' of 2665:/tmp/./execname
ee14a674 _getaux (7de, 8046f48, ee14aecd, 7de, feffa7d0, 8046f58) + 34
ee14a78e getauxptr (7de, feffa7d0, 8046f58, feffcd68, 8046f58, 80509cb) + e
ee14aecd getexecname (8046f4c, 80470d4, 8046f84, 805092d, 1, 8046f90) + 1d
080509cb main(1, 8046f90, 8046f98, 8060a44, ee1fa540, 8046f84) + b
0805092d _start(1, 804713c, 0, 804714c, 8047159, 80471b7) + 7d
% pflags /tmp/core
core '/tmp/core' of 2665:/tmp/./execname
data model = _ILP32 flags = RLC|BPTADJ|MSACCT|MSFORK
flttrace = 0x00000004
sigtrace = 0xfffffeff 0xffffffff
HUP|INT|QUIT|ILL|TRAP|ABRT|EMT|FPE|BUS|SEGV|SYS|PIPE|ALRM|TERM|USR1|USR2|CLD|PW R|WINCH|URG|POLL|STOP|TSTP|CONT|TTIN|TTOU|VTALRM|PROF|XCPU|XFSZ|WAITING|LWP|FREE ZE|THAW|CANCEL|LOST|XRES|JVM1|JVM2|RTMIN|RTMIN+1|RTMIN+2|RTMIN+3|RTMAX-3|RTMAX-2 |RTMAX-1|RTMAX
entryset = 0x00000403 0x04000000 0x00000000 0x00400000
0x80004000 0x00000000 0x00000000 0x00000000
exitset = 0x00000002 0x00000000 0x00000000 0x00400000
0x40004000 0x00000000 0x00000000 0x00000000
/1:flags = 0
sigmask = 0xfffffeff,0x0000ffff cursig = SIGSEGV
# 4
nl_langinfo() works fine with latest dbx (please wait for the first patch to be available to get the fix), but getexecname() is still broken. I'll file a bug and post bug ID here.
# 5
Bug ID is 6573845, it should be visible on bugs.sun.com in 24 hours.
# 6
> nl_langinfo() works fine with latest dbx (please wait
> for the first patch to be available to get the fix),
> but getexecname() is still broken. I'll file a bug
> and post bug ID here.
Oh, so all these rtc crashes do not have the same
root cause?
That is, if I find more rtc crashes like setlocale(), nl_langinfo()
getexecname() it still would be interesting to know which
functions are affected?
# 7
The cause might be the same - nested signals. Dbx replaces memory access instructions with an instruction that generates SIGSEGV; when it happens, signal handler installed by dbx is invoked and it performs all necessary checks. If the application tries to install its own handler for SIGSEGV, it might fail, thus the "Terminating signal 11" message.
The only possible fix for this is to skip instrumentation of functions that can cause nested signals; there's built-in command for that in dbx, rtc skippatch (type 'help rtc skippatch' in dbx to see more info) and there's internal list of such functions, which is frequently updated. However, it is not always easy to determine which function should not be instrumented.
So if you find more errors like this, please report them here.
# 8
> The cause might be the same - nested signals. Dbx
> replaces memory access instructions with an
> instruction that generates SIGSEGV; when it happens,
> signal handler installed by dbx is invoked and it
> performs all necessary checks. If the application
> tries to install its own handler for SIGSEGV, it
> might fail, thus the "Terminating signal 11"
> message.
I'm not sure I understand this. My sample programs do
not install SIGSEGV handlers.
But I can imagine that the issue is that dbx rtc is trying
to use setlocale(), nl_langinfo(), getexecname() while
running in the SIGSEGV handler, checking the memory
access (maybe because rtc has found some problem
and is trying to report it), and now trips over the patched
memory access instructions in these functions.
And indeed, using the dbx commands
(dbx) check -all
(dbx) rtc skippatch libc.so.1 -f _setlocale _nl_langinfo _getexecname thr_keycreate_once _thr_keycreate_once pthread_key_create_once_np _pthread_key_create_once_np
seems to work around these issues.
> The only possible fix for this is to skip
> instrumentation of functions that can cause nested
> signals; there's built-in command for that in dbx,
> rtc skippatch (type 'help rtc skippatch' in dbx to
> see more info) and there's internal list of such
> functions, which is frequently updated. However, it
> is not always easy to determine which function should
> not be instrumented.
>
> So if you find more errors like this, please report
> them here.
readdir64_r() is the next one:
% cat readdir.c
#define _POSIX_PTHREAD_SEMANTICS 1
#include <sys/types.h>
#include <dirent.h>
int
main(int argc, char **argv)
{
struct dirent dent, *dent_result;
DIR *dir = opendir("/");
readdir_r(dir, &dent, &dent_result);
}
% cc -g -o readdir readdir.c `getconf LFS_CFLAGS`
% bcheck -access readdir
Reading readdir
Reading ld.so.1
Reading rtcapihook.so
Reading libc.so.1
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading libgen.so.1
Reading libm.so.2
Reading rtcboot.so
Reading librtc.so
access checking - ON
Running: readdir
(process id 4269)
RTC: Enabling Error Checking...
RTC: Running program...
Reading disasm.so
terminating signal 11 SIGSEGV
% pstack core
core 'core' of 4269:/tmp/readdir
ee166897 readdir64_r (fefa0300, 8046f40, 8046f3c, fefa0300, ee1fcdf0, 133f) + 27
08050a29 main(1, 8046f88, 8046f90, 8046f7c, 805090f, 8050a60) + 29
0805096d _start(1, 8047130, 0, 804713d, 8047158, 80471b6) + 7d
# 9
Okay, I've updated bug report.
# 10
> So if you find more errors like this, please report
> them here.
The smedia_get_handle() function in libsmedia.so
triggers *lots* or rtc SIGSEGV problems:
% cat smedia.c
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/smedia.h>
int
main(int argc, char **argv)
{
char *devname = "/dev/removable-media/rdsk/c0t0d0p0";
int fd;
if (argv[1])
devname = argv[1];
fd = open(devname, O_RDONLY | O_NONBLOCK);
if (fd < 0) {
perror(devname);
exit(1);
}
smedia_get_handle(fd);
}
% cc -g -o smedia smedia.c -lsmedia
(Run it with the raw device for your system's cd/dvd device, e.g.
something like ./smedia /dev/rdsk/c6t0d0p0)
So far, I had to add at least these libc.so functions to
rtc skippatch, but it's still crashing:
rtc skippatch libc.so.1 -f _clear_internal_mbstate \
_thr_keycreate_once _pthread_key_create_once_np \
_getaux _fstat readdir64_r \
_nsc_proc_is_cache _nsc_getdoorbsize nss_dbop_search \
_nsc_initdoor_fp _nsc_proc_is_cache _nsc_proc_is_cache \
_s_fcntl __door_info htonl ntohl \
membar_consumer
Most of the rtc crashes are inside a clnt_create() call, which is
using netdir_getbyname(), and the name service switch routines.
Note: My test box is a NIS client, using a nsswitch.nis configuration
in /etc/nsswitch.conf.
Typical crash:
e7d8cbc7 s_fcntl (e7df6650, e7df669c, e7df6650, e7df5000, 8, fdac00b8)
e7d52f59 _nsc_try1door (e7df6650, 80406b0, 80406b4, 80406b8, 804066c, 0) + 21
e7d532c3 _nsc_trydoorcall_ext (80406b0, 80406b4, 80406b8, db527718, 0, e7df5000) + 21b
e7d60eff _nsc_search (db59f258, db527718, 4, 8040758, 8066e84, 0) + bf
e7d5faf6 nss_search (db59f258, db527718, 4, 8040758) + 32
db52b36f _switch_getipnodebyname_r (8040b89, 8066e84, 8066e98, 2120, 1a, 3) + 7b
db52a44a _get_hostserv_inetnetdir_byname (8066dc8, 8040858, 8040838) + a9e
db525e0d netdir_getbyname (8066dc8, 8040898, 80408d8) + c9
db54baf8 _getclnthandle_timed (8040b89, 8066dc8, 8040918, db59f9e8) + 1ac
db54c5a4 __rpcb_findaddr_timed (1873b, 1, 8066dc8, 8040b89, 80409b4, db59f9e8) + 43c
db540b6e clnt_tp_create_timed (8040b89, 1873b, 1, 8066dc8, 0) + 42
db5404b5 clnt_create_timed (8040b89, 1873b, 1, 0, 0) + 179
db540336 clnt_create (8040b89, 1873b, 1, 0, 5, 8064790) + 26
dc9017aa is_server_running (80648a0, e230296d, 8064790, dc913000, f3142c3c, f311464c) + 3e
dc901db7 get_handle_from_fd (5, e2312a24, 8, 0, 80413a8, d0abf2df) + e3
dc9015af smedia_get_handle (5, 403, 8041340, f3114651, 64, f3142c3c) + 1b
