Regarding Parsing a binary file

hello all,

i am struggling to parse a binary file, any help on parsing this will be great.

also the file size is quite big so i have to keep performance in mind.

the file looks something,like this

[33623858467 <binaty data>~1158746601769~False~False]

[23623858467 <binaty data>~1158746601769~False~False]

[53623858467 <binaty data>~1158746601769~False~False]

i.e. the format is

[MSISDN <binary data>~[MSISDN]~<boolean value>~<boolean value>]

My aim is to extract the last two boolean values.

My source code is

try{

String accountFileName = fileName;

RandomAccessFile fi =new RandomAccessFile(accountFileName,"r");

int s;

while((s=fi.read())!=-1){

//Search for the string "["

if(s == 91){

int count = 0;

List<Integer> countTildas =new ArrayList<Integer>();

System.out.println("FOUND SOMETHING...");

//Search till "]" is found

while(fi.read()!= 93){

fi.seek(fi.getFilePointer()-1 );

//Search for "~"

if(fi.read() == 126){

System.out.println("Position of " + count++ +" ~ : " + fi.getFilePointer() );

countTildas.add((int)fi.getFilePointer());

fi.seek(fi.getFilePointer()+1 );

}

}

System.out.println("Count : " + count);

}

}

}

catch(Exception e){

e.printStackTrace();

}

i am able to extract the position of ~ present within each section i.e. [....~.....~...~...] [...~...~...~...] [...~...~..~...] and so on...in the file, but cant find a way to extract the content between them..

[2556 byte] By [angeshwara] at [2007-10-3 5:25:00]
# 1

Is the layout as you showed, i.e. each 'record' is on a different row? What is 'binary data' from your point of view, is it real binary data - bytes that can contain any value, or in fact these are letters that represent something encoded somehow (Ascii-7, base64 or some nuber written in hexadecimal format as letters 0-F) ?

You can retrieve the things you want using flags when a word stops and a new one starts, but depending on the exact format things may be much easier.

Mike

bellyrippera at 2007-7-14 23:32:12 > top of Java-index,Java Essentials,Java Programming...
# 2
First, a question. How is made sure that the binary part doesn't contain a '~'?
CeciNEstPasUnProgrammeura at 2007-7-14 23:32:12 > top of Java-index,Java Essentials,Java Programming...
# 3

> i am able to extract the position of ~ present within

> each section i.e. [....~.....~...~...]

> [...~...~...~...] [...~...~..~...] and so on...in the

> file, but cant find a way to extract the content

> between them..

If this is in fact the actual question.

1) Create a byte array with length between two elements ~

2) Seek to the start of the element

3) Use readFully

pseudo-code

byte[] buff = new byte[somelength];

fi.seek(startpos);

fi.readFully(buff);

// buff now contains the bytes for that section

cotton.ma at 2007-7-14 23:32:12 > top of Java-index,Java Essentials,Java Programming...
# 4
You could take that slap it in a ByteArrayInputStream and use a Scanner on it if you like.
cotton.ma at 2007-7-14 23:32:12 > top of Java-index,Java Essentials,Java Programming...
# 5

Hello,

i may have cracked it, attached is the working piece of code which extracts

the last two boolean values

the format is

[MSISDN <binary data>~[MSISDN]~<boolean value>~<boolean value>]

[33623858467 <binaty data>~1158746601769~False~False]

[23623858467 <binaty data>~1158746601769~False~False]

[53623858467 <binaty data>~1158746601769~False~False]

Some clarification for the replies posted

- The binary part may or may-not contain '~' , but the format is

[MSISDN <binary data>~[MSISDN]~<boolean value>~<boolean value>]

so irrespective of the '~' in the binary file the last two boolean values

will be seperated by a '~' with an MSISDN appearing in front of the boolean strings also seperated by ~ . Also these 3 data are strings in the binary file

(only the SMS content is 7-bit-encoded-binary data)

- there is no row/record structure, i just split the the data to make it readable,

the real data will be something like

[ +33623858467 20100  &Ab#A-?{R 瑍1158746601769~False~False][ +33623858467 20100  "Gb#  - 沯moF悅r?gt;蒆剅€3~1158746601769~False~False][ +33624161744 20100  &A~#O? :? ~1158746601769~False~False][ +33624161744 20100  凜~# 丣208101054013692 ?A ?

?z( ? ?

?t( ? ?

?Af ? ?

?

 ?

?錒 K  ?

?鴺 x  ?

?T ? ~1158746601769~False~False][ +33603371355 20100  G?  - v~1158746601769~False~False]

this piece of code works fine, it simply stores each block of data

i.e. [.....] [.........] and so on, iterates from the back and extracts

the bytes present between two '~'.

try{

String accountFileName = fileName;

RandomAccessFile fi = new RandomAccessFile(accountFileName, "r");

int s;

while((s=fi.read())!=-1){

//Search for the string "["

if(s == 91){

int count = 0;

List<Integer> countBytes = new ArrayList<Integer>();

System.out.println("FOUND SOMETHING...");

//Search till "]" is found

while(fi.read()!= 93){

fi.seek(fi.getFilePointer()-1 );

countBytes.add(fi.read());

}

//Create a delimiter for iterating between each block i.e. [] []

int firstDelimiter = 0;

//Extract the first boolean value (Reading the data from the back)

List<Integer> addFirstBooleanData = new ArrayList<Integer>();

for(int i=countBytes.size()-1 ; i>1 ; i--){

firstDelimiter++;

if(countBytes.get(i) == 126)

break;

addFirstBooleanData.add(countBytes.get(i));

}

//Convert the byte array to String

ByteArrayOutputStream baos1 = new ByteArrayOutputStream();

for(int i=addFirstBooleanData.size()-1 ; i>-1 ; i--){

baos1.write(addFirstBooleanData.get(i));

}

System.out.println(baos1.toString());

//Extract the second boolean value (Reading the data from the back)

List<Integer> addSecondBooleanData = new ArrayList<Integer>();

for(int i=countBytes.size()-firstDelimiter-1 ; i>1 ; i--){

//System.out.println("I value is : " + i);

firstDelimiter++;

if(countBytes.get(i) == 126)

break;//System.out.println(countTildas.get(i));

addSecondBooleanData.add(countBytes.get(i));

}

//Convert the byte array to String

ByteArrayOutputStream baos2 = new ByteArrayOutputStream();

for(int i=addSecondBooleanData.size()-1 ; i>-1 ; i--){

baos2.write(addSecondBooleanData.get(i));

}

System.out.println(baos2.toString());

//Extract the third data (Reading the data from the back)

List<Integer> addThirdData = new ArrayList<Integer>();

for(int i=countBytes.size()-firstDelimiter-1 ; i>1 ; i--){

//System.out.println("I value is : " + i);

firstDelimiter++;

if(countBytes.get(i) == 126)

break;//System.out.println(countTildas.get(i));

addThirdData.add(countBytes.get(i));

}

//Convert the byte array to String

ByteArrayOutputStream baos3 = new ByteArrayOutputStream();

for(int i=addThirdData.size()-1 ; i>-1 ; i--){

baos3.write(addThirdData.get(i));

}

System.out.println(baos3.toString());

}

}

}

catch(Exception e){

e.printStackTrace();

}

However I am concerned about the performance because there may be

3000 messages every 12 seconds (as per the spec), so we can imagine

3000 blocks of such data, so i may have to come up with a far more efficient method of parsing, if this turns out to be too slow....

angeshwara at 2007-7-14 23:32:12 > top of Java-index,Java Essentials,Java Programming...