Problem with unicode converting from "#x...." to "\u..."

Hi,

I'm working on some unicode decoding stuff. The origianl data I have is in text format and the unicode started with "#x", for example, "#x3008" means "<". I used the following code to replace the "#x" with "\u". However, after the replacement, the "\u3008" will just appear as "\u3008", not being decoded into "<" in java.

Could anyone help me on this?

Thanks a lot!

The test.java I used is as below:

import java.io.*;

import java.text.ParseException;

import java.util.*;

import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class test {

public static void main(String argv[]){

String rawText="CEG1 \u30086J\u3009 cells #x3008";

//Tabby #x30086J#x3009 (lane 4); 5#x20137

Pattern p = Pattern.compile("#x([0-9A-Fa-f]{4})");

Matcher m = p.matcher(rawText);

String newtext=m.replaceAll("\\\\u$1");

;

System.out.println(rawText);

System.out.println(newtext);

}

}

[1020 byte] By [zkou79a] at [2007-10-2 8:00:08]
# 1

converting the \uXXXX escape codes to real characters is done only before compiling source code, it's not an inherent feature of the platform.

you need to extract the numeric code, parse it into a number, and cast the number to a character..

s = "#x3008";

n = Integer.parseInt(s.substring(2), 16);

print((char) n); // prints out "〈"

jsalonena at 2007-7-16 21:51:49 > top of Java-index,Java HotSpot Virtual Machine,Specifications...