Convert ASCII CSV file
i have a program which able to digest a CSV file into the DB (SQLanywhere), however my original CSV file is a ASCII format, where the program is only accept UTF-8. is there any method can convert the CSV file from ASCII to UTF-8 automatically?
Q1.
is it using the following for conversion?
byte[] bytes = test.getBytes("IS0-8859-1");
String encoded = new String(bytes,"UTF-8");
Q2.
is there any method can eliminate the text qualifier (")?
e.g. "0028634063","BETTY CROCKER'S KIDS COOK!-HB","121.50" become
0028634063,BETTY CROCKER'S KIDS COOK!-HB,121.50
Thanks a lot!
# 1
Q1.
Just specify the correct encodings when you read from the file and when you write to the database. InputStreamReader lets you specify an encoding for reading a file; the database software should have a similar mechanism. You shouldn't have to do any conversions within your program.
Q2.
Removing quotes is one of the basic functions of a CSV tool; if the one you're using can't do it, get a better one.
# 2
Also, ASCII is a subset of UTF-8, in the sense that UTF-8 encodes every ASCII character as itself. So if you have an ASCII file, you can read that with no problem using UTF-8.
That's of course if you really do have an ASCII file and not something encoded in one of the dozens of "extended ASCII" character sets out there.
# 3
> Q1.
> is it using the following for conversion?
> byte[] bytes = test.getBytes("IS0-8859-1");
> String encoded = new String(bytes,"UTF-8");
I'm pretty sure the above is not going to do what you expect. You convert some test string to bytes in the 8859-1 encoding. Now bytes has 8859-1 values. Then, you try to create "encoded" with that same "bytes" array...but now you seem to think it suddenly has UTF-8 values. It doesn't...it still has the 8859-1 values you just created. Using "bytes" as a UTF-8 array doesn't seem quite right.
--
John O'Conner