is there such as named regex matching

for Java regex pattern, is named group available like "<?myprice>($\d+.\d\d)"?

alsoI must be really thick, after teh reading of teh online tutorial, I still don't have a concrete idea .

Also if my input seqence cnosidst of mulitple logical records of multi line, how do I extract the various records of anmed capted groups?

In C and dot net, I can use arrays etc...

[397 byte] By [g_gunna] at [2007-11-26 20:22:02]
# 1

There are no 'named' groups in any regex engine I have used but, depending on what one is trying to do, one can access a group content by group index -

1) using the Matcher.group(int index) to extract the content into a Java String,

2) using \1, \2,\3 etc to use the matched content as part of a regex,

3) using $1,$2,$3 etc to use as part of the replacement in replaceAll().

I don't really understand what you mean by

> Also if my input seqence cnosidst of mulitple logical records of multi line, how do I

> extract the various records of anmed capted groups?

I think you need to give an example of what you are trying to do here.

>

> In C and dot net, I can use arrays etc...

Java has arrays! I don't see the relevance of this to regex. Please elaborate.

Message was edited by:

sabre150

sabre150a at 2007-7-10 0:47:20 > top of Java-index,Java Essentials,Java Programming...
# 2

thank you for your resply.

1. Data record to match

example of data: each logical record consists of 3 lines. logical records are separated by a line without any word ( but may have sapces of sort

for each record,

line 1's last numerical word is SKU

line2: 1st word is PartCode, 2nd word to next last word are for Description of parts, last numberical word is qty

line3: 2nd last word beginig with the dollar sign is price. the last word is always Add

So here is some hopthetical data

some words 0124578

156234-AA product descriptions till next last word in theline the last word int his line is qty5

a number words from begining of line $472.15 Add

156234Aanother product descriptions till next last word in the line. the last word inthis line is qty48

a number words from beginning of line-the next wrod is price starting with a dollar sign and is always followed by the word Add $472.15 Add

The reason I loathe using number groups is that I am not all that skilled in regex, and I often end up with extra matched groups than I intended

Dot net does have named group expression

. if Java had name expression I woudl have use apttern like this

"^\\w+\\s(<?sku>\\d{7-9})\\s$(<?mfrCd>\\w)\\s(<?desc>(\\w+)\\s(<?avail>(\\d+))$)"+ "\\w+(<?price>(\\$\\d+[.]\\d\\d))\\sAdd"

what I should hav siad was, I did no see tutorial example of using arrays in group matching for java.

g_gunna at 2007-7-10 0:47:21 > top of Java-index,Java Essentials,Java Programming...
# 3

I don't see why you need named groups. All you have to do is count the groups! You only have 4!

I don't see why you need "tutorial example of using arrays". You can just use loop using Matcher.find() and extract the 4 values whenever you find a match.

You don't say how the data is presented so more detail is required for further help.

sabre150a at 2007-7-10 0:47:21 > top of Java-index,Java Essentials,Java Programming...
# 4

Check this out: import java.util.regex.*;

public class Test

{

static Pattern p = Pattern.compile(

"^.+(\\d{7,9})\\s+^(\\S+)\\s+(.+)\\s+(\\d+)\\s+^.+(\\$\\d+[.]\\d\\d)\\s+Add",

Pattern.MULTILINE);

public static void main(String... args)

{

String str = "some words 0124578\n" +

"156234-A A product descriptions till next last word in the" +

" line the last word int his line is qty 5\n" +

"a number words from begining of line $472.15 Add";

Matcher m = p.matcher(str);

if (m.find())

{

System.out.printf("%nsku: %s%nmfrCd: %s%ndesc: %s%navail: %s%nprice: %s%n",

m.group(1), m.group(2), m.group(3), m.group(4), m.group(5));

}

}

}

By the way, your .NET regex syntax is a little off. Instead of this: (<?sku>\\d{7-9}) you would write this: (?<sku>\\d{7,9})

^ ^

uncle_alicea at 2007-7-10 0:47:21 > top of Java-index,Java Essentials,Java Programming...
# 5

your reply help me a lot. Thank you

Now only if can get it to deal with multiple record. right now it simply get the last one

looks like .+ is rather greedy especially in multi-line mode

Btw: you are right about my .net regx syntax being off.I should looked before writing from memory as I said, my skill for regex is low. I always have to look at an example before I cna do anything with regex.

g_gunna at 2007-7-10 0:47:21 > top of Java-index,Java Essentials,Java Programming...
# 6

the input consists of multiple records of multi-line for each records. the input source is in the clipboard.

The extraction result is both displayed in the jtextpane and set in the clipboard for pasting to other application to process.

the format of the result is formateted as following

for each record, the extracted fields are delimited by tab character, one record a line

g_gunna at 2007-7-10 0:47:21 > top of Java-index,Java Essentials,Java Programming...
# 7

> looks like .+ is rather greedy especially in

> multi-line mode

?, *, +, and {} are greedy by default, but you can also make them reluctant or possessive. From Pattern's API doc:

Greedy quantifiers

X? X, once or not at all

X* X, zero or more times

X+ X, one or more times

X{n} X, exactly n times

X{n,} X, at least n times

X{n,m} X, at least n but not more than m times

Reluctant quantifiers

X? X, once or not at all

X*? X, zero or more times

X+? X, one or more times

X{n}? X, exactly n times

X{n,}? X, at least n times

X{n,m}? X, at least n but not more than m times

Possessive quantifiers

X?+ X, once or not at all

X*+ X, zero or more times

X++ X, one or more times

X{n}+ X, exactly n times

X{n,}+ X, at least n times

X{n,m}+ X, at least n but not more than m times

Details here:

http://java.sun.com/docs/books/tutorial/essential/regex/quant.html

http://www.regular-expressions.info/repeat.html

jverda at 2007-7-10 0:47:21 > top of Java-index,Java Essentials,Java Programming...
# 8

thank you for the wonderful explanation.

BTW How do I handle alternative matching?

in particular, most of the date record end with

some word before the the price begining with dollar sign $999.99 Add

But there are a few recrods ending wiht something like this instead

some word before the the price begining with dollar sign $999.99

Reg.$1,233.60

Save $123.88

ends on 3/31/07Add

changing (\\$\\d+[.]\\d\\d)\\s+Add

to

(\\$\\d+[.]\\d\\d)\\s+\\S{0,7}Add

does not help

I guess \\S does not match new line

g_gunna at 2007-7-10 0:47:21 > top of Java-index,Java Essentials,Java Programming...
# 9

The dot metacharacter normally matches anything but a line separator character, so the ".+" subexpressions in my regex will only match within a single line. MULTILINE mode doesn't change that, it just allows ^ and $ to match line boundaries instead of just the start and end of the input. It's DOTALL that lets the dot match line separators. For example, you can match that extra price information with a reluctant ".*?" in DOTALL mode: "^.+(\\d{7,9})\\s+^(\\S+)\\s+(.+)\\s+(\\d+)\\s+^.+(\\$\\d+[.]\\d\\d(?s:.*?))Add"

The "(?s:xyz)" construct sets the DOTALL flag only while the subexpression within those parens is trying to match.

uncle_alicea at 2007-7-10 0:47:21 > top of Java-index,Java Essentials,Java Programming...
# 10

Thank you. Sorry, I 'm really thick or something. I tried replacing the regex with your suggestion and failed to get match on the record with extra price info. it did match all otehr regualr records.

So (?s:.*?) turns on reluctant moe for .*, right? so it should have gogble up anything between the price and Add in the even the newlines between in match group 6, right?

So did we hit a bug in regex for java version "1.5.0_11"? or simnply the regex reluctant mode is not implemented forthis java version?

g_gunna at 2007-7-10 0:47:21 > top of Java-index,Java Essentials,Java Programming...
# 11

> So (?s:.*?) turns on reluctant moe for .*, right?

The (? whatever) makes it a non-capturing group.

The ?s: turns on DOTALL, so the dot can match a line separator.

The .* gobbles zero or more of anything,

The ? after the * makes it reluctant, meaning it will start by matching as little as possible (nothing, here) and then add a character at a time as necessary until the rest of the regex (the "Add") matches. WIthout the reluctant flag (?), it would gobble everything up to the very last occurence of "Add", including intervening "Add"s.

> So did we hit a bug in regex for java version

> "1.5.0_11"? or simnply the regex reluctant mode is

> not implemented forthis java version?

I don't know about the bug you're talking about, but reluctance has been around since regex was added in 1.4, I think.

jverda at 2007-7-10 0:47:22 > top of Java-index,Java Essentials,Java Programming...
# 12
We'll need to see some of the data you're using the regex on.
uncle_alicea at 2007-7-10 0:47:22 > top of Java-index,Java Essentials,Java Programming...