UTF-16 coding

hi,

I was writing a little program when I was wondering about a problem I got.

In the library I use a package called jid3lib -> for mp3 id tags, no problem with that, but when using:

artist in mp3 is called: "Aborted"

byte[] bytes = tagv2.getLeadArtist().getBytes();//("UTF-16");

--> propose this is UTF-16 codering which is legal and real in my case.

and I write something out:

System.out.println(new String(bytes));

System.out.println(new String(bytes, "UTF-16"));

byte[] b = ("Aborted").getBytes("UTF-16");

System.out.println(new String(b));

System.out.println(new String(b, "UTF-16"));

I get:

嗀_b_o_r_t_e_d

Aborte?

for the mp3 tag and:

?_A_b_o_r_t_e_d

Aborted

for me own test.

ps: the "_" is a illigal character to display so I replaced it with this one... you know the rectangle one.

So you can see the problem, why the ? instead of a 'd' and how can I avoid this?

is there a common algoritm to decode all sorts of encodings like this? (f.e. remove all the bad characters?)

thx

[1138 byte] By [hansds2003a] at [2007-11-27 9:27:46]
# 1

There's nothing wrong. It's how UTF-16 encoding works.The first 2 "characters" you see are the byte-order mark (BOM), and all characters are 2 bytes. Since the chars you have are in the ASCII range, the first byte for each is 0.

You probably don't want to use UTF-16 in MP3 files. Not sure what the official format should be, but it's probably either ASCII, ISO-8895-1 or UTF-8.

Message was edited by:

bsampieri

bsampieria at 2007-7-12 22:30:42 > top of Java-index,Java Essentials,Java Programming...