Regarding Parsing a binary file
hello all,
i am struggling to parse a binary file, any help on parsing this will be great.
also the file size is quite big so i have to keep performance in mind.
the file looks something,like this
[33623858467 <binaty data>~1158746601769~False~False]
[23623858467 <binaty data>~1158746601769~False~False]
[53623858467 <binaty data>~1158746601769~False~False]
i.e. the format is
[MSISDN <binary data>~[MSISDN]~<boolean value>~<boolean value>]
My aim is to extract the last two boolean values.
My source code is
try{
String accountFileName = fileName;
RandomAccessFile fi =new RandomAccessFile(accountFileName,"r");
int s;
while((s=fi.read())!=-1){
//Search for the string "["
if(s == 91){
int count = 0;
List<Integer> countTildas =new ArrayList<Integer>();
System.out.println("FOUND SOMETHING...");
//Search till "]" is found
while(fi.read()!= 93){
fi.seek(fi.getFilePointer()-1 );
//Search for "~"
if(fi.read() == 126){
System.out.println("Position of " + count++ +" ~ : " + fi.getFilePointer() );
countTildas.add((int)fi.getFilePointer());
fi.seek(fi.getFilePointer()+1 );
}
}
System.out.println("Count : " + count);
}
}
}
catch(Exception e){
e.printStackTrace();
}
i am able to extract the position of ~ present within each section i.e. [....~.....~...~...] [...~...~...~...] [...~...~..~...] and so on...in the file, but cant find a way to extract the content between them..
[2556 byte] By [
angeshwara] at [2007-10-3 5:25:00]

Hello,
i may have cracked it, attached is the working piece of code which extracts
the last two boolean values
the format is
[MSISDN <binary data>~[MSISDN]~<boolean value>~<boolean value>]
[33623858467 <binaty data>~1158746601769~False~False]
[23623858467 <binaty data>~1158746601769~False~False]
[53623858467 <binaty data>~1158746601769~False~False]
Some clarification for the replies posted
- The binary part may or may-not contain '~' , but the format is
[MSISDN <binary data>~[MSISDN]~<boolean value>~<boolean value>]
so irrespective of the '~' in the binary file the last two boolean values
will be seperated by a '~' with an MSISDN appearing in front of the boolean strings also seperated by ~ . Also these 3 data are strings in the binary file
(only the SMS content is 7-bit-encoded-binary data)
- there is no row/record structure, i just split the the data to make it readable,
the real data will be something like
[ +33623858467 20100 &Ab#A-?{R 瑍1158746601769~False~False][ +33623858467 20100 "Gb# - 沯moF悅r?gt;蒆剅€3~1158746601769~False~False][ +33624161744 20100 &A~#O? :? ~1158746601769~False~False][ +33624161744 20100 凜~# 丣208101054013692 ?A ?
?z( ? ?
?t( ? ?
?Af ? ?
?
?
?錒 K ?
?鴺 x ?
?T ? ~1158746601769~False~False][ +33603371355 20100 G? - v~1158746601769~False~False]
this piece of code works fine, it simply stores each block of data
i.e. [.....] [.........] and so on, iterates from the back and extracts
the bytes present between two '~'.
try{
String accountFileName = fileName;
RandomAccessFile fi = new RandomAccessFile(accountFileName, "r");
int s;
while((s=fi.read())!=-1){
//Search for the string "["
if(s == 91){
int count = 0;
List<Integer> countBytes = new ArrayList<Integer>();
System.out.println("FOUND SOMETHING...");
//Search till "]" is found
while(fi.read()!= 93){
fi.seek(fi.getFilePointer()-1 );
countBytes.add(fi.read());
}
//Create a delimiter for iterating between each block i.e. [] []
int firstDelimiter = 0;
//Extract the first boolean value (Reading the data from the back)
List<Integer> addFirstBooleanData = new ArrayList<Integer>();
for(int i=countBytes.size()-1 ; i>1 ; i--){
firstDelimiter++;
if(countBytes.get(i) == 126)
break;
addFirstBooleanData.add(countBytes.get(i));
}
//Convert the byte array to String
ByteArrayOutputStream baos1 = new ByteArrayOutputStream();
for(int i=addFirstBooleanData.size()-1 ; i>-1 ; i--){
baos1.write(addFirstBooleanData.get(i));
}
System.out.println(baos1.toString());
//Extract the second boolean value (Reading the data from the back)
List<Integer> addSecondBooleanData = new ArrayList<Integer>();
for(int i=countBytes.size()-firstDelimiter-1 ; i>1 ; i--){
//System.out.println("I value is : " + i);
firstDelimiter++;
if(countBytes.get(i) == 126)
break;//System.out.println(countTildas.get(i));
addSecondBooleanData.add(countBytes.get(i));
}
//Convert the byte array to String
ByteArrayOutputStream baos2 = new ByteArrayOutputStream();
for(int i=addSecondBooleanData.size()-1 ; i>-1 ; i--){
baos2.write(addSecondBooleanData.get(i));
}
System.out.println(baos2.toString());
//Extract the third data (Reading the data from the back)
List<Integer> addThirdData = new ArrayList<Integer>();
for(int i=countBytes.size()-firstDelimiter-1 ; i>1 ; i--){
//System.out.println("I value is : " + i);
firstDelimiter++;
if(countBytes.get(i) == 126)
break;//System.out.println(countTildas.get(i));
addThirdData.add(countBytes.get(i));
}
//Convert the byte array to String
ByteArrayOutputStream baos3 = new ByteArrayOutputStream();
for(int i=addThirdData.size()-1 ; i>-1 ; i--){
baos3.write(addThirdData.get(i));
}
System.out.println(baos3.toString());
}
}
}
catch(Exception e){
e.printStackTrace();
}
However I am concerned about the performance because there may be
3000 messages every 12 seconds (as per the spec), so we can imagine
3000 blocks of such data, so i may have to come up with a far more efficient method of parsing, if this turns out to be too slow....