Regex help

Hi all,

I'm trying to break a string at every , and .

So I have something like Pattern.compile("(\\. )|(, )");

but this is not working well for me, I wanted to change it so it will match the "." only if it found only one in a given word

Example:

"Break here. But don't break he.re."

So basically I want to know how to define a pattern that requires no more than 1 "." in a word.

Thanks alot.

[442 byte] By [tomerg3a] at [2007-11-26 13:19:37]
# 1
So, you're saying that you want to find periods that are surrounded by word characters?
paulcwa at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 2
Trial and error, lol. http://forum.java.sun.com/thread.jspa?threadID=771359&messageID=4395122
TuringPesta at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 3
If you define a word as being surrrounded by whitespace, then look for a space, followed by at least 0 characters,followed by one period,followed by at least 0 characters,followed by a spaceYou'll have to decide what constitutes a "character".
ChuckBinga at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 4

It is not clear to me what you want but is this what you want

String line = "Break here. But, don't break he.re.";

String[] splitLine = line.split(",|(?=\\s)\\.(?!\\w)|(?!\\w)\\.(?=\\s)");

for (String segment : splitLine)

{

System.out.println("["+segment+"]");

}

sabre150a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 5

its not really what I want, if the string is

"Break here. But, don't break he.re. ok"

I get

[Break here]

[ But]

[ don't break he.re]

[ ok]

What I want to get is

[Break here]

[ But]

[ don't break he.re ok]

Thanks for the quick replies

tomerg3a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 6

> its not really what I want, if the string is

> "Break here. But, don't break he.re. ok"

> I get

> [Break here]

> [ But]

> [ don't break he.re]

> [ ok]

> What I want to get is

> [Break here]

> [ But]

> [ don't break he.re ok]

Is this last line correct? Should it be

[ don't break he.re.ok] ? If so then

final String[] splitLine = line.split(",|\\.(?=\\s)");

is the best I can do without a more formal definition.

Message was edited by:

sabre150

sabre150a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 7
Sorry, correcrtion [Break here] [ But] [ don't break he.re. ok]leaving the "he.re." as is
tomerg3a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 8
> Sorry, correcrtion> [Break here]> [ But]> [ don't break he.re. ok]> eaving the "he.re." as isThis is what my modified regex does with your test string.
sabre150a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 9
String[] splitLine = line.split(",|\\.(?=\\s)");Gets me[Break here][ But][ don't break he.re][ ok]and I want[Break here][ But][ don't break he.re. ok]Again, thanks for the help
tomerg3a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 10
In your test case is there a space between the 'he.re.' and the 'ok'? If so then what is the criteria for splitting or not splitting on a full stop?
sabre150a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 11
I want to split on a full stop ONLY if its the only full stop in that word (the word may be in a foreign language aswell)so I want to break on these cases:bye.p醙ina.and I don't want to break for there casesU.S.S.R.殚.uu.
tomerg3a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 12
Sorry but I can't do this in one regex because the required 'look behind' needs to have a maximum length.You may have more luck if uncle_alice (the regex GURU) picks this up.
sabre150a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 13
I was trying to do something like\s + anything but a . + \. + \sbut either I wrote it wrong, or it just doesn't work
tomerg3a at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 14

No, this is definitely not a job for split(). In a case like this, I usually recommend a positive matching approach, using Matcher.find(), but even that's going to be pretty ugly this time. The code below yields the desired result given your sample data, but I pity the guy who would have to maintain it. You're probably better off writing your own parser using String.charAt() and Character.isLetter() and such.

import java.util.*;

import java.util.regex.*;

public class Test

{

public static List<String> breakIt(String str)

{

Pattern p = Pattern.compile("(['\\pL]++([.,]['\\pL]++)*+)[.,]\\s++");

Matcher m = p.matcher(str);

List<String> result = new ArrayList<String>();

int pos = 0;

while (m.find())

{

if (m.start(2) == -1)

{

result.add(str.substring(pos, m.end(1)));

pos = m.end(0);

}

}

if (pos < str.length())

{

result.add(str.substring(pos));

}

return result;

}

public static void main(String[] args)

{

String test = "Break here. But, don't break he.re. ok";

for (String s : breakIt(test))

{

System.out.printf("[%s]%n", s);

}

}

}

uncle_alicea at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...
# 15
I'm not sure if I've ever seen such such ug....ly code!Good job, uncle_alice ;)(and please don't do it again!)
ChuckBinga at 2007-7-7 17:46:38 > top of Java-index,Java Essentials,Java Programming...