Charset conversion from UTF-8 to windows-1252
Getting this conversion to run has taken me 5 hours so far with no usable result!
I have Strings in UTF-8 format that needs to be posted over a socket encoded in charset windows-1252.
Requirements:
- I do not want to write the Strings to a file before i read them back into a string
- I want to create a simple method that inputs a UTF-8 string and returns a windows-1252 encoded string.
The only reason for doing this is that the Norwegian characters 曝?does not display correctly on the target system in UTF-8.
# 1
> Getting this conversion to run has taken me 5 hours
> so far with no usable result!
>
> I have Strings in UTF-8 format that needs to be
> posted over a socket encoded in charset
> windows-1252.
First, if the information is in java Strings, then it is in Unicode format, not UTF-8. If you read UTF-8 data into a String without the appropriate conversion, then that is likely a problem.
However, assuming that the data is correctly stored as Unicode characters in a String, then it can be converted to your desired format using one of the OutputStreamWriter constructors that takes either a Charset or a CharsetDecoder parameter.
>
> Requirements:
> - I do not want to write the Strings to a file before
> i read them back into a string
Use a combination of PipedOutputStream/PipedInputStream (for bytes) or PipedWriter/PipedReader (for java characters) to transfer data without files. See this tutorial, if needed (there are others, search for them):
http://java.sun.com/developer/JDCTechTips/2001/tt0208.html
# 2
I haven't checked out the PipedInputStream solution yet. I did a quick hack like this:
private String convertCharSet(String t)
{
String t1=t.replace("?, "\u00F8");
String t2=t1.replace("?, "\u00D8");
String t3=t2.replace("?, "\u00E5");
String t4=t3.replace("?, "\u00C5");
String t5=t4.replace("?, "\u00E6");
String t6=t5.replace("?, "\u00C6");
//log.info("COMMAND: "+t6);
return t1;
}
Unfortunately this does not work from the production system where the default charset is UTF-8. When i tested this on my windows xp test system it worked fine. I guess this has something todo with the fact that the program interprets the UTF-8 characters differently on the linux machine and therefore the String.replace function does not work.
Will post back here when I have tried the PipedOutputStream/PipedInputStream translation.
# 4
Let's assume you have the correct data in your Java string variables. Then to write that data to the socket in windows-1252 encoding, you do this:Writer w = new OutputStreamWriter(yourSocketsOutputStream, "windows-1252");
w.write(yourString);
And if you don't have the correct data in your Java string variables, it's probably because you used the system's default encoding to read them from wherever they came from instead of the actual encoding (which was UTF-8). Using an InputStreamReader with a specific encoding might fix that.
In either case you should check your data in the Java program to see if it's what you expect.