parse message algorithms..

hi all,

we are involving in sms project. due to the limitation of sms..

we need parse a exceed length message into smaller one.

we have implement a method to split the message to an array of vector.

but i got the the method seem very burte force. here is the example code. I am seeking a better approach to handle the same thing. any suggestion. there is some constraint. we need add "cont..." for first message if exceed length, "related to .." and "cont." between the message if it still span in next message. if the added message excced the length. the whole line will be added to next message vector.

import java.util.*;

publicclass SplitMessage2

{

publicstatic String msg2 ="Testing messageessage, @@this is a message @@this is a message edfd@@for testing split @@message function, @@it will split exceed @@length message into multiple";

publicstatic String con_msg ="Msg Cont..";

publicstatic String related_label ="Related..";

publicstaticvoid main(String args[])

{

StringTokenizer st =new StringTokenizer(msg2,"@@");

System.out.println("num of tokens:"+st.countTokens());

Vector v2 = split_message2(msg2,140);//the length must larger than length cont..+length of a line content

for (int i =0;i<v2.size();i++)

{

System.out.println("Line"+i+":"+ (String)v2.get(i));

}

}

publicstatic Vector split_message2(String msg,int len)

{

StringTokenizer st =new StringTokenizer(msg,"@@");

String loc_i_smsContinue ="Cont...";

String loc_i_smsRelate ="Related..";

StringBuffer loc_cls_sbSMSContent =new StringBuffer();

String loc_str_smsString ="";

Vector loc_vec_smsMsg =new Vector(2,2);

int i = 0;

int j =0;

int index = 0;

int totaltokens = st.countTokens();

j=0;

while (i==0)

{

while (true)

{

for(;j><totaltokens;)

{

System.out.println("Start of for-do");

if (j == 0){

//main header

System.out.println("in main header j:"+j);

loc_str_smsString = st.nextToken();

}

elseif(index ==j)//create a new page

{

//append related header

loc_cls_sbSMSContent.delete(0, loc_cls_sbSMSContent.length());

System.out.println("elseif will append:"+loc_str_smsString);

loc_cls_sbSMSContent.append(loc_i_smsRelate);//append previous one

loc_cls_sbSMSContent.append(loc_str_smsString);//append previous one

loc_str_smsString ="";

System.out.println("elseif num of tokens:"+st.countTokens());

if (st.hasMoreTokens())

{

loc_str_smsString = st.nextToken();

}else

{

j=totaltokens;

}

}else//next token

{

System.out.println("else num of tokens:"+st.countTokens());

if (st.hasMoreTokens())

{

loc_str_smsString = st.nextToken();

}else

{

j=totaltokens;

}

}

if (!(loc_str_smsString.length()==0) && (loc_cls_sbSMSContent.length() + loc_str_smsString.length() + loc_i_smsContinue.length()) ><= len)

{

System.out.println("in core function j:"+j);

loc_cls_sbSMSContent.append(loc_str_smsString);

j++;

}

else

{

if (j!=totaltokens)//not last one

loc_cls_sbSMSContent.append(loc_i_smsContinue);

//loc_vec_smsMsg.addElement(loc_cls_sbSMSContent.toString());

System.out.println("First Break");

index = j;

break;

}

}

if (j>=totaltokens)

{

System.out.println("All finish");

i=-1;

}

System.out.println("Last Break");

break;

}

System.out.println("Append sb to vector:"+i+loc_cls_sbSMSContent);

loc_vec_smsMsg.addElement(loc_cls_sbSMSContent.toString());

}

return loc_vec_smsMsg;

}

}

Message was edited by:

kyho

[7114 byte] By [kyhoa] at [2007-10-3 5:45:10]
# 1

Something like

public List split(String input, int numChars) {

List list = new ArrayList();

int end = 0;

int start = 0;

while(start < input.length()) {

int end = Math.min(start+numChars, input.length());

list.add(input.substring(start, end);

start = end;

}

return list;

}

Note this probably does not do what you want (even if it does work) as this splits the input string into chunks containing numChars charaters. characters in java are 16 bit whereas you problably want to count the number of bytes.

Cheers

matfud

matfuda at 2007-7-14 23:53:24 > top of Java-index,Other Topics,Algorithms...
# 2

matfud,

thx ur advise, it simplifies a lot of code. but there is one more constraints.

for 1 if the message is a paragraph, it contians newline "\n" between strings. if the added string is larger than the default length. we need to extract the whole line and add into a new arraylist. that's why i use stringtokenizer to separate into multiple array and add into arraylist line by line.

but my approach seems quite burte force, it checks a lot of case. any better approaches?

kyhoa at 2007-7-14 23:53:24 > top of Java-index,Other Topics,Algorithms...
# 3

The only thing you HAVE to optimize for is the character length. Spliting on paragrams in addition to this is a heuristic. Any heuristic will beform badly in some situations.

ie. if the code you posted does what you say it does then a two paragram message, where p1 is MAX_SIZE+1 chars will be written out as three SMS instances.

The first MAX_SIZE characters will be the first SMS.

The second message will contain a single character.

The third message will contain the second paragraph.

Finding reasonable compromises in situation such as this is tricky and probably not worth while (extra cost for eac message sent.

The best you can probably do is to see if a paragraph ends within x characters of the end and split there.

You can probably modify the code I posted about to find an "end" value that is the last "\n" before the real end but within a "limit"

Cheers

matfud

matfuda at 2007-7-14 23:53:24 > top of Java-index,Other Topics,Algorithms...