text to binary (.txt to .bin)
Hello,
I'm developing an application on J2ME that needs to read from files.
I want to use binary files instead text files, because I think that in this way I have two benefits:
1. reducing the jar size
2. speeding up the reading process.
that's the starting point...and I hope this approach is right.
I've developed a small class in J2SE just as an utility to convert .txt files in .bin files and...GULP!!!...a txt file of 22KB becomes 44KB converted in .bin...there is something wrong, but I don't know.
here is the code I wrote:
package texttobin;
import java.io.*;
publicclass Main{
publicstaticvoid main(String[] args){
File inFile =null;//the file to read
File outFile =null;//the file to write
//read the path from commandline
if (args.length > 0)
{
inFile =new File(args[0]);
outFile =new File(args[1]);
}
//some checkings
if (inFile ==null)return;
if (outFile ==null)return;
try{
//setup the streams
FileOutputStream outputStream =new FileOutputStream (outFile);
DataOutputStream dataOutputStream =new DataOutputStream (outputStream);
FileInputStream sourceStream =new FileInputStream(inFile);
byte readingByte[] =newbyte[1];
while(true){
if (sourceStream.read(readingByte)!=-1){
//to be sure: I convert the readed input byte in a string
//and pass the char inside to the writeChar method...
dataOutputStream.writeChar(new String(readingByte).charAt(0));
//the same result is achieved with:
//dataOutputStream.writeChars(new String(readingByte));
}
elsebreak;
}
dataOutputStream.close();
outputStream.close();
sourceStream.close();
}
catch(FileNotFoundException fnfe)
{
System.out.println (fnfe);
return;
}
catch(IOException ioe)
{
System.out.println (ioe);
return;
}
}
}
WHAT'S WRONG?!
THANKS A LOT!!!
daniele
Nothing's wrong. One Java char is made up of 2 bytes, so your binary format is no improvement. I'd suggest using some compression format such as ZIP instead.
thank you,this is clear but if a binary file uses 2 bytes for each char and txt uses 7, why the size of the bin is bigger than txt?is it not possible to make a bin file smaller from a txt?thank you again,daniele
> this is clear but if a binary file uses 2 bytes for> each char and txt uses 7, why the size of the bin is> bigger than txt?Because you don't understand. ISO-something: 1 byte/char. Java char = UTF16 = 2 byte/char.Nobody said anything about 7 bytes.
> Nobody said anything about 7 bytes.I think OP mistook the 7 (lower) bits used in ASCII encoding with 7 bytes ... that was 7 bits, right?
Hi,what i dont understand: why should it matter in terms of jar size whether the app reads from binary or from text files?regardsBugBunny
is the what applications that run on my cellphone look under the hood?*cringe*im surprised the thing even boots
Try
while (sourceStream.read(readingByte)!=-1)
{
dataOutputStream.write(readingByte);
}
This way you'll have a simple byte copy to the outFile.
Note that you're not assuring the char encoding. To acomplish
that, you should use the writeUTF(String) method to save
and readUTF() to load (as said before, your file will get bigger).
[]'s
> This way you'll have a simple byte copy to the
> outFile.
> Note that you're not assuring the char encoding. To
> acomplish
> that, you should use the writeUTF(String) method to
> save
> and readUTF() to load (as said before, your file will
> get bigger).
thank you,
in this way i got the same size of the original file...
But doesn't help me too much...
I'm just wandering how is it possible to read about 3000 lines of a text file as faster as possible, from a txt file with only InputStreamReader I take about 50 seconds on the emulator and of course I'm sure that there is an answer...
I've downloaded some free applications that use xml or txt files converted in binary and the performances are very good, i've saw a bible in about 300K!
if you don't belive me, take a look:
http://gobible.jolon.org/
uhm...sorry, I don't want write off-topic things...
> I'm just wandering how is it possible to read about
> 3000 lines of a text file as faster as possible, from
> a txt file with only InputStreamReader I take about
> 50 seconds on the emulator and of course I'm sure
> that there is an answer...
After looking at your code I feel like giving a hint on speed improvement: use buffered reading. Accessing a stream byte for byte is low-performant. For example:
InputStream in = /* ... */;
BufferedInputStream buffy = new BufferedInputStream(in);
int b = 0;
while ((b = buffy.read()) != -1) {
/* use the byte */
}
// don't forget: in.close();
Or do buffering for yourself:
InputStream in = /* ... */;
byte[] buffy = new byte[4096]; // use a reasonable buffer size; example here: 4K
int r = 0;
while ((r = in.read(buffy)) != -1) {
/* use the byte array, r bytes are in there */
}
// don't forget in.close();
> im surprised the thing even bootsMine does, too. Quite often actually.
> > im surprised the thing even boots
>
> Mine does, too. Quite often actually.
I don't understand you.
I'm sorry, my english is quite bad, but in my first post I wrote that the code was for J2SE.
OBVIOUSLY THAT CODE IS NOT FOR MOBILE and it is not optimized, have you ever seen a midlet stating with main method?!
The reason why I wrote this thread here is because I NEED HELP FOR MAKING TEXT FILES SMALLER, as the subject suggests, and then read them faster in a J2ME app.
I'm sorry again if I did not explain clearly the problem, but please, if you don't have an answer or a suggest, don't write.
thank you,
d.
ZIP it or use a 7-bit encoding scheme (which isn't easy). As for faster reading: you need to see that the decoding of your file format doesn't take more time than the reading of an unencoded format would take.
> After looking at your code I feel like giving a hint> on speed improvement: use buffered reading. Accessing> a stream byte for byte is low-performant.that's intresting, I will try, thank you!
> ZIP it or use a 7-bit encoding scheme (which isn't
> easy). As for faster reading: you need to see that
> the decoding of your file format doesn't take more
> time than the reading of an unencoded format would
> take.
Yes, you're right, but I thought that the bottleneck is the reading process, not the decoding algorhytm (hopefully!!!)
I found an article that say:
-
An ASCII file is a binary file that stores ASCII codes. Recall that an ASCII code is a 7-bit code stored in a byte. To be more specific, there are 128 different ASCII codes, which means that only 7 bits are needed to represent an ASCII character.
However, since the minimum workable size is 1 byte, those 7 bits are the low 7 bits of any byte. The most significant bit is 0. That means, in any ASCII file, you're wasting 1/8 of the bits. In particular, the most significant bit of each byte is not being used.
-
http://www.cs.umd.edu/class/spring2003/cmsc311/Notes/BitOp/asciiBin.html
(and sorry for the mistake: 7-bit not 7 bytes!!!)
another intrested thing I've found is here:
http://www2.sys-con.com/ITSG/virtualcd/Java/archives/0607/heaton/index.html
there is some ways to store strings in bin files (like fstream in C++), but I didn't tested yet if in this way I can reduce the size of the bin.
The files are stored in the jar file, then zipping maybe can reduce more the size of the jar, but I think that when the stream opens it takes the real bytes of the contained text (well...I'm not sure...may you confirm it?)
thank you.
> The files are stored in the jar file, then zipping
> maybe can reduce more the size of the jar, but I
> think that when the stream opens it takes the real
> bytes of the contained text (well...I'm not
> sure...may you confirm it?)
JARs are nothing but ZIPs. Zipping a ZIP usually just adds to the overall size.
> That means, in any ASCII file,> you're wasting 1/8 of the bits. In particular, the> most significant bit of each byte is not being used.Hah. That's what you english speakers/writers think ...
> Hah. That's what you english speakers/writers think ...Not quite. ASCII only has 7 bits. Everything that uses the eigth bit is not ASCII, and then it's not an ASCII file.
Oh c'mon, you know I haven't been serious ...
> > Hah. That's what you english speakers/writers think
> ...
>
> Not quite. ASCII only has 7 bits. Everything that
> uses the eigth bit is not ASCII, and then it's not an
> ASCII file.
Extended ASCII is usually referred to as ASCII and uses 8 bits :)
> Extended ASCII is usually referred to as ASCII and
> uses 8 bits :)
then, if java stores chars in 2 byte = 16bits and if extended ascii code uses only 8 bits, is it reasonable to think that I can store strings in binary files using the half space, or I still miss something?
> then, if java stores chars in 2 byte = 16bits and if
> extended ascii code uses only 8 bits, is it
> reasonable to think that I can store strings in
> binary files using the half space, or I still miss
> something?
Those binary files you're talking about contain a binary format of Java variables, thus a single char consumes two bytes; the encoding used is UTF-16. Normal text files as you know them are practically never pure ASCII(-7), but most probably one of those many 8-bit ASCII extensions standardized through the ISO under standard number 8859, or it's an operating system vendor specific, ISO-like 8-bit encoding such as Microsoft's Codepages (Cp1252 for instance). Anyway, files are handled byte-oriented. The only way you can save space is to use compression, as has been suggested several times.
Thank-you,
now all it's clear!!!!!!!
saving data using DataOutputStream.write(char) saves chars using 8bit, and this is the less possible.
look at this (I post this in case it will be useful for someone else):
import java.io.*;
public class Main {
//converts the char in its real bytes
static byte [] charToByteArray( char c )
{
byte [] twoBytes = { (byte)(c & 0xff), (byte)(c >> 8 & 0xff) };
return twoBytes;
}
//convert the "significant" byte in the character
static char byteToChar(byte b)
{
char c = (char)(b & 0xFF);
return c;
}
public static void main(String[] args) {
char string[] = new char[]{'c','i','a','o',' ','s','o','n','o',' ','d','a','n','i','e','l','e'};
File txtFile = null;//the file to write
File binFile = null;//the file to write
if (args.length > 0)
{
txtFile = new File(args[0]);
binFile = new File(args[1]);
}
try {
//setup the text stream
FileOutputStream txtFileOutputStream = new FileOutputStream (txtFile);
DataOutputStream txtDataOutputStream = new DataOutputStream (txtFileOutputStream);
//setup the bin stream
FileOutputStream binFileOutputStream = new FileOutputStream (binFile);
DataOutputStream binDataOutputStream = new DataOutputStream (binFileOutputStream);
//byte for representing the char data
byte readingByte[] = new byte[1];
System.err.println( "char\tbyte[0]\tbyte[1]\tchar form byte[0]" );
for (int i=0;i<string.length;i++)
{
//writes the character as is
txtDataOutputStream.write(string[i]);
//converts the char in a couple of bytes [0000][0000]
byte b[] = charToByteArray(string[i]);
//take the "most significant" byte of the char and write it to the bin file
binDataOutputStream.write(b[0]);
System.err.println( string[i]+"\t"+b[0]+"\t"+b[1]+"\t"+ byteToChar(b[0]) );
}
//close the streams
txtDataOutputStream.close();
binDataOutputStream.close();
}
//bla bla bla...
catch(FileNotFoundException fnfe)
{
System.out.println (fnfe);
}
catch(IOException ioe)
{
System.out.println (ioe);
}
}}
Thanks everyone again!
daniele>
