PHP: PCRE is not compiled with PCRE_UTF8 support
Hello,
I installed Sun WS 7.0 with the PHP plugin and DokuWiki on Solaris 9 sparc.
When accessing DokuWiki I get -
Warning: preg_replace() [function.preg-replace]: Compilation failed:this version of PCRE is not compiled with PCRE_UTF8 support at offset 0 in /3beg/doku/htdocs/wiki/inc/utf8.php on line 421
Compiling GNU pcre on the server doesn't change the situation:
# ./configure --enable-utf8 --enable-unicode-properties --disable-shared
Where should I go for this error ?
Can someone help me: is the root cause PHP, pcre or DokuWiki ?
-- Nick
[637 byte] By [
der_nikia] at [2007-11-27 5:46:56]

# 2
phpinfo gives
System SunOS voyager 5.9 Generic_118558-17 sun4u
Build Date Jan 29 2007 23:12:49
Configure Command './configure' '--prefix=/java/re/phppack/5.2.0/nightly/ws/b01-2007-01-29/solaris-sparc/dist/5 .2.0/SunOS5.8_OPT.OBJ/php'
'--bindir=/java/re/phppack/5.2.0/nightly/ws/b01-2007-01-29/solaris-sparc/dist/5 .2.0/SunOS5.8_OPT.OBJ/php/bin/'
'--libdir=/java/re/phppack/5.2.0/nightly/ws/b01-2007-01-29/solaris-sparc/dist/5 .2.0/SunOS5.8_OPT.OBJ/php/lib/'
'--libexecdir=/java/re/phppack/5.2.0/nightly/ws/b01-2007-01-29/solaris-sparc/di st/5.2.0/SunOS5.8_OPT.OBJ/php/libexec/'
'--disable-static' '--enable-shared' '--disable-cli' '--disable-cgi' '--with-pic'
'--with-nsapi=/tmp/webserver7-spi.26277' '--enable-force-cgi-redirect' '--disable-rpath' '--enable-safe-mode' '--enable-ftp' '--enable-sockets' '--enable-memory-limit' '--enable-inline-optimization' '--enable-zlib'
'--enable-soap' '--with-dba' '--enable-sysvmsg' '--enable-sysvsem'
'--enable-sysvshm' '--enable-sqlite-utf8' '--enable-zend-multibyte'
'--enable-bcmath' '--enable-exif' '--enable-magic-quotes' '--enable-wddx'
'--enable-mbstring' '--enable-mbstr-enc-trans' '--enable-mbregex'
'--enable-gd-native-ttf'
'--with-pcre-regex=/h/iws-files/s/b/c/pcre/6.7/SunOS5.8_OPT.OBJ'
'--with-iconv-dir=/h/iws-files/s/b/c/libiconv/1.11/SunOS5.8_OPT.OBJ'
'--with-libxml-dir=/h/iws-files/s/b/c/libxml2/2.6.27/SunOS5.8_OPT.OBJ'
'--with-zlib=/h/iws-files/s/b/c/zlib/1.2.3/SunOS5.8_OPT.OBJ'
'--with-mysql=/h/iws-files/s/b/c/mysql/5.0.27/SunOS5.8_OPT.OBJ'
'--with-mysqli=/h/iws-files/s/b/c/mysql/5.0.27/SunOS5.8_OPT.OBJ/bin/mysql_confi g'
'--with-pgsql=/h/iws-files/s/b/c/postgresql/8.1.5/SunOS5.8_OPT.OBJ'
...
so PHP has been compiled with
--with-pcre-regex=/h/iws-files/s/b/c/pcre/6.7/SunOS5.8_OPT.OBJ'
Hmm what now?
What do you mean with "...then it would use pcre module from the specified location..."
# 5
ah, u have clearly mentioned that u r running on solaris sparc. my mistake.
by the way, i wrote a simple program and tried to run this program with the php binary and i got expected response
Regarding the validity of a UTF-8 string when using the /u pattern modifier, some things to be aware of;
1. If the pattern itself contains an invalid UTF-8 character, you get an error (as mentioned in the docs above - "UTF-8 validity of the pattern is checked since PHP 4.3.5"
2. When the subject string contains invalid UTF-8 sequences / codepoints, it basically result in a "quiet death" for the preg_* functions, where nothing is matched but without indication that the string is invalid UTF-8
3. PCRE regards five and six octet UTF-8 character sequences as valid (both in patterns and the subject string) but these are not supported in Unicode ( see section 5.9 "Character Encoding" of the "Secure Programming for Linux and Unix HOWTO" - can be found at http://www.tldp.org/ and other places )
4. For an example algorithm in PHP which tests the validity of a UTF-8 string (and discards five / six octet sequences) head to: http://hsivonen.iki.fi/php-utf8/
The following script should give you an idea of what works and what doesn't;
<?php
$examples = array(
'Valid ASCII' => "a",
'Valid 2 Octet Sequence' => "\xc3\xb1",
'Invalid 2 Octet Sequence' => "\xc3\x28",
'Invalid Sequence Identifier' => "\xa0\xa1",
'Valid 3 Octet Sequence' => "\xe2\x82\xa1",
'Invalid 3 Octet Sequence (in 2nd Octet)' => "\xe2\x28\xa1",
'Invalid 3 Octet Sequence (in 3rd Octet)' => "\xe2\x82\x28",
'Valid 4 Octet Sequence' => "\xf0\x90\x8c\xbc",
'Invalid 4 Octet Sequence (in 2nd Octet)' => "\xf0\x28\x8c\xbc",
'Invalid 4 Octet Sequence (in 3rd Octet)' => "\xf0\x90\x28\xbc",
'Invalid 4 Octet Sequence (in 4th Octet)' => "\xf0\x28\x8c\x28",
'Valid 5 Octet Sequence (but not Unicode!)' => "\xf8\xa1\xa1\xa1\xa1",
'Valid 6 Octet Sequence (but not Unicode!)' => "\xfc\xa1\xa1\xa1\xa1\xa1",
);
echo "++Invalid UTF-8 in pattern\n";
foreach ( $examples as $name => $str ) {
echo "$name\n";
preg_match("/".$str."/u",'Testing');
}
echo "++ preg_match() examples\n";
foreach ( $examples as $name => $str ) {
preg_match("/\xf8\xa1\xa1\xa1\xa1/u", $str, $ar);
echo "$name: ";
if ( count($ar) == 0 ) {
echo "Matched nothing!\n";
} else {
echo "Matched {$ar[0]}\n";
}
}
echo "++ preg_match_all() examples\n";
foreach ( $examples as $name => $str ) {
preg_match_all('/./u', $str, $ar);
echo "$name: ";
$num_utf8_chars = count($ar[0]);
if ( $num_utf8_chars == 0 ) {
echo "Matched nothing!\n";
} else {
echo "Matched $num_utf8_chars character\n";
}
}
?>
# 6
The version of the PCRE library which is bundled with PHP add-on is 6.7 . But in your case 5.0 version of the library is used and that I think, is compiled without UTF-8 support. Web Server's lib directory too has pcre library and looks like this is getting used.
That means you must have configured the Web Server to use PHP as NSAPI plugin. If that is the case, try modifying the Web Server instance's bin/startserv script as follows (assuming the Web Server is run in 32 bit mode):
.....
# Add instance-specific information to LD_LIBRARY_PATH for Solaris and Linux
LD_LIBRARY_PATH="<web-server-install-dir>/plugins/php:${SERVER_LIB _PATH}:${SERVER_JVM_LIBPATH}:${LD_LIBRARY_PATH}"; export LD_LIBRARY_PATH
.....
Restart the instance after the modification and check if this resolves the issue.