Appending to a UNICODE encoded file

Dear Fellow Developers,

I ran into an issue trying to append to a UNICODE encoded file. Below is a test program for you to try.

import java.io.BufferedWriter;

import java.io.FileOutputStream;

import java.io.IOException;

import java.io.OutputStreamWriter;

publicclass UnicodeFileWriter{

publicstaticvoid main(String[] args)throws IOException{

BufferedWriter writer =new BufferedWriter(new OutputStreamWriter(

new FileOutputStream("unicode.txt",true),"unicode"));

writer.write("Hello World!");

writer.close();

}

}

When I run the above code the first time, it works fine and the file contents are as shown below:

Hello World!

When I run it the second time, the file contents are as shown below:

Hello World!?Hello World!

Notice the invalid character (?) between the first and the second write operation. Looking at the file with a HEX viewer, I see the characters (0xFF and 0xFE) after the first write (or before the second write) operation. The same characters appear at the very beginning of the file.

Any ideas on why we are getting these in append mode? Are they supposed to be present there? If not, is there a way to remove them?

Thanks in advance for your help.

[1988 byte] By [psaia] at [2007-11-27 2:01:52]
# 1
These are the Byte Order Marker (BOM) bytes. I seem to remember that if you explicitly tell the writer to use UTF16LE or UTF16BE then they don't get written. Your UNICODE will be one of these. I can't remember which of these is which so try them both.
sabre150a at 2007-7-12 1:42:32 > top of Java-index,Core,Core APIs...
# 2

The file gets written correctly with UTF-16LE as the encoding, but then I do not see any BOMs at all, not even at the very beginning of the file. I just read about the BOMs little bit here at http://www.websina.com/bugzero/kb/unicode-bom.html.

After reading the above page, I almost want to come to a conclusion that one should not append to a "unicode" (or UTF-16) encoded file. If APPENDING is needed, then the exact encoding (little endian or big endian) need to be known and specified. Do you guys agree with me?

psaia at 2007-7-12 1:42:32 > top of Java-index,Core,Core APIs...
# 3
You can work around this by creating a helper method/class to open the file in append mode. If the file exists then it just opens it UTF-16LE. If the file does not exist it open it UNICODE.
sabre150a at 2007-7-12 1:42:32 > top of Java-index,Core,Core APIs...