RandomAccessFile Full unicode support, trouble

I use RandomAccessFile to read a text file, using readLine() function. But this method does'nt support Full unicode characters so my output text file has many unknown characters like ( ? ). Which method should I use that support full unicode characters ?
[269 byte] By [andreihortuaa] at [2007-11-27 8:33:17]
# 1
Read the bytes of the line and then use new String(bytesRead, encoding);
sabre150a at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...
# 2
Using a RandomAccessFile to read text? That may not be a bad code smell, but it is bordering on ripe cheese odor.
Hippolytea at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...
# 3

> Using a RandomAccessFile to read text? That may not

> be a bad code smell, but it is bordering on ripe

> cheese odor.

If the OP has an index file that points to the start of all the lines then I see no problem BUT if he is just reading sequentially then I too would smell ripe cheese.

sabre150a at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...
# 4

How a String constructor may fix that readline() does not reads more than eight bits ?

see:

http://java.sun.com/j2se/1.4.2/docs/api/java/io/RandomAccessFile.html#readLine()

"Each byte is converted into a character by taking the byte's value for the lower eight bits of the character and setting the high eight bits of the character to zero. This method does not, therefore, support the full Unicode character set."

I don't understand please detail me your answer.

:(

andreihortuaa at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...
# 5
You don't use readLine to read the bytes (not characters) of the line. You read the bytes from the end of one line until the start of the next. Given these bytes you convert them to a String as I explained in my previous post.
sabre150a at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...
# 6

> I use RandomAccessFile to read a text file, using

> readLine() function. But this method does'nt support

> Full unicode characters so my output text file has

> many unknown characters like ( ? ).

>

> Which method should I use that support full unicode

> characters ?

First you explain what you think that you are going to do with the RandomAccessFile in the first place.

Do you just want to read the file? Then use a different java.io class.

If you want to move around in the file reading and writing then you are going to have a serious problem unless the file is written in a fixed byte size unicode character set.

jschella at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...
# 7

:D

OK

Here is the code:

/*

VERBOSE MACRO

Desc: Converts a txt file to csv by inserting "; semicolons @ consecutive TXT-TAB found!

To: GMT - Mansfield (Product Feature's word files to database)

By: Andrei Hortua

Date: Mon, 21-May-2007

*/

import java.lang.String;

import java.io.IOException;

import java.io.RandomAccessFile;

import java.io.FileWriter;

public class Macro

{

public static boolean isText(char c)

{

// If Ascii is Capitals or min

if ((( (int)c )> 64 && ( (int)c )< 91) || (( (int)c )> 96 && ( (int)c )<123))

return true;

// Si Ascii is a number

else if ((( (int)c )> 47 && ( (int)c )< 58))

return true;

else

return false;

}

public static boolean isTab(char c)

{

if((int)c == 9)

return true;

else

return false;

}

public static int safeIndex(int index)

{

if( (index-1) < 0 )

{

return 0;

}

else

{

return index;

}

}

public static void main(String[] args) throws IOException

{

try

{

// Filename is a txt file passed by args

String filename = args[0];

// Opens file as read only

RandomAccessFile raf = new RandomAccessFile ( filename, "r" ) ;

// line gets a line of the file once

String line = "";

// resut keeps processed file

String result = "";

// index "from position"

int index1 = 0;

// index "to position"

int index2 = 0;

// spaces counts end of lines

int spaces = 0;

int lns = 0;

// While file not ends

while( (line=raf.readLine())!=null )

{

System.out.println("\n\nReads a line of " + filename);

//System.out.println ("\nCurrent line is " +lns );

for(int i = 0; i < line.length(); i++)

{

//If char is text

if( isText( line.charAt(i)) )

{

System.out.println("\nChar is text "+ line.charAt(i));

System.out.println("\nReseting spaces ");

// Reset spaces

spaces = 0;

System.out.println("\nActual spaces "+spaces);

//Adds text to result

if (i == 0)

{

result += line.substring(0,1);

index1 = 1;

}

else if (i > 0)

{

result += line.substring(index1, i);

index1 = i ;

}

}// End if isText

// If char is TAB

else if(isTab(line.charAt(i)))

{

System.out.println( "\nChar is TAB " );

if (i>0)

{

if(isText(line.charAt(i-1)))

{

// TXT, TAB

result += line.substring(index1, i)+";";

index1 = i + 1;

}

} // End if not initial TAB

} // End if is tab

else if( !line.equals("") )

{

//result += line;

}

else

{

//System.out.println ( "\n Else " + line );

spaces ++;

System.out.println ( "\n Spaces increment " + spaces );

}

// If ends one product features item

if( spaces==2 )

{

System.out.println ( "\nEnds one product features item " + spaces );

result += "\";\n\r";

System.out.println ( "\n Result\n" + result );

}

} // End For (end of line)

index1 = 0;

lns++;

}// End of while (end of file)

// Close Random Access File

raf.close();

System.out.println ( "\n Close Random Access File " );

// Creates new file and name it as csv

FileWriter fw = new FileWriter( filename +".csv" );

System.out.println ( "\n Creates new file and name it as csv \n" );

// Save result on file

fw.write(result);

System.out.println ( "\n Save result on file");

// Close file

fw.close();

System.out.println ( "\n Close file " + filename );

}//End of try

catch (IOException e)

{

System.out.println( e.getMessage() );

}// End of catch

}//End of main

}// End of Class

andreihortuaa at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...
# 8
Should I use readChar() instead of readLine() ?
andreihortuaa at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...
# 9
Use a BufferedReader instead of a RandomAccessFile if you've having Unicode problems.
ejpa at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...
# 10
> Here is the code:All you are doing is reading, so you shouldn't be using RandomAccess for that.
jschella at 2007-7-12 20:29:23 > top of Java-index,Java Essentials,New To Java...