How to parse a string ?

Hi,

I have a string with something like

"123/45ABC888"

There are 4 parts:

"123" , "/45", "ABC", and "888",

the first part contains only numbers,

the second part contains only numbers too but starting with a "/",

the third part contains only letters,

the fourth part contains only numbers.

How to fetch the parts from a string?

Regards,

Pengyou

[417 byte] By [pengyoua] at [2007-11-27 7:42:20]
# 1
you could substringing the String...
suparenoa at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 2
However, the length of each part is not fixed. So substring will not work.
pengyoua at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 3
substrings will work if you use non-fixed indexesbest way to do this though is to use regextry this: http://www.txt2re.com
mkoryaka at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 4

If you have varying length of each of the 4parts and are guarenteed that each part will have at least 1 member then you can easily do a state machine. The algo goes like this:

State 1 read numeric digits until "/" throw away the "/"

State 2 read read numbers until next is a letter

State 3 read letters until next is a number

State 4 read the rest

morgalra at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 5
Hii guess this is a typical case where a regular expression might be useful. You might want to have a look a the classjava.util.regex.MatcherregardsBugBunny
BugBunnya at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 6
I will try java.util.regex.Matcher tomorrow.
pengyoua at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 7

If you don't mind a little black magic... public class Test

{

public static void main(String... args)

{

String str = "123/45ABC888";

String[] parts = str.split("(?<=\\d)(?=\\D)|(?<=[A-Z])(?=[^A-Z])");

for (String s : parts)

{

System.out.printf("[%s]%n", s);

}

}

}

^_^

uncle_alicea at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 8

I tried:

public class Test

{

public static void main(String[] args)

{

String str = "123/45ABC888";

String[] parts = str.split("(?<=\\d)(?=\\D)|(?<=[A-Z])(?=[^A-Z])");

System.out.println("parts.length="+parts.length);

for (int i=0;i<parts.length;i++ ){

System.out.println("parts["+i+"]="+parts);

}

}

}

It gave:

parts[0]=123

parts[1]=123

parts[2]=123

parts[3]=123

But I would have

parts.length=4

parts[0]=123

parts[1]=45

parts[2]=ABC

parts[3]=888

Could any regular expression guru help?

Pengyou">

pengyoua at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 9

Your (or uncle_alice's) code produces the following output:

parts.length=4

parts[0]=123

parts[1]=/45

parts[2]=ABC

parts[3]=888

You could do it like this:

String text = "123/45ABC888";

Pattern pattern = Pattern.compile("\\d+|[a-zA-Z]+");

Matcher matcher = pattern.matcher(text);

while (matcher.find()) {

System.out.println(matcher.group());

}

That way you won't have the "/" in your substrings.

prometheuzza at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 10

Oh, so you don't want the slash! Then add it to the regex: String[] parts = str.split("/|(?<=\\d)(?=\\D)|(?<=[A-Z])(?=[^A-Z])");

And if you change the requirements any more, you should go with a positive-matching approach (like what prometheuzz suggested) instead of split(). The regexes tend to be much simpler that way.

Also, please use [code][/code] tags when you post source code.

uncle_alicea at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 11
> Oh, so you don't want the slash! > ...The OP is rather vague: in reply #8 he omitted the slash but in his original post he did mention it.
prometheuzza at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 12

with

String text = "123ABC888";

Pattern pattern = Pattern.compile("\\d+|[a-zA-Z]+");

Matcher matcher = pattern.matcher(text);

while (matcher.find()) {

System.out.println(matcher.group());

}

I will get as result

For "123ABC888"

123

ABC

888

For "123/44ABC888"

123

44

ABC

888

It is fine. But I want to have always 4 parts:

Part 1: number

Part 2: number, separated by "/"

Part 3: letter

Part 4: number

Part 2 is optional. Part 3 and Part 4 together are also optional.

So when I get for "123ABC", 123 and ABC, how can I know that "ABC" is part 3 but not part 2.

Any idea for this?

Pengyou

pengyoua at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...
# 13

> with

>

> String text = "123ABC888";

> Pattern pattern =

> Pattern.compile("\\d+|[a-zA-Z]+");

>Matcher matcher = pattern.matcher(text);

> while (matcher.find()) {

>System.out.println(matcher.group());

>

>

> I will get as result

>

> For "123ABC888"

> 123

> ABC

> 888

>

> For "123/44ABC888"

> 123

> 44

> ABC

> 888

>

> It is fine. But I want to have always 4 parts:

> Part 1: number

> Part 2: number, separated by "/"

> Part 3: letter

> Part 4: number

>

> Part 2 is optional. Part 3 and Part 4 together are

> also optional.

>

> So when I get for "123ABC", 123 and ABC, how can I

> know that "ABC" is part 3 but not part 2.

>

> Any idea for this?

>

> Pengyou

Finally I solved my problem by

if (text.indexOf("/")>0)

This allows me to know if part 2 is there.

pengyoua at 2007-7-12 19:23:11 > top of Java-index,Java Essentials,Java Programming...