is there such as named regex matching
for Java regex pattern, is named group available like "<?myprice>($\d+.\d\d)"?
alsoI must be really thick, after teh reading of teh online tutorial, I still don't have a concrete idea .
Also if my input seqence cnosidst of mulitple logical records of multi line, how do I extract the various records of anmed capted groups?
In C and dot net, I can use arrays etc...
[397 byte] By [
g_gunna] at [2007-11-26 20:22:02]

There are no 'named' groups in any regex engine I have used but, depending on what one is trying to do, one can access a group content by group index -
1) using the Matcher.group(int index) to extract the content into a Java String,
2) using \1, \2,\3 etc to use the matched content as part of a regex,
3) using $1,$2,$3 etc to use as part of the replacement in replaceAll().
I don't really understand what you mean by
> Also if my input seqence cnosidst of mulitple logical records of multi line, how do I
> extract the various records of anmed capted groups?
I think you need to give an example of what you are trying to do here.
>
> In C and dot net, I can use arrays etc...
Java has arrays! I don't see the relevance of this to regex. Please elaborate.
Message was edited by:
sabre150
thank you for your resply.
1. Data record to match
example of data: each logical record consists of 3 lines. logical records are separated by a line without any word ( but may have sapces of sort
for each record,
line 1's last numerical word is SKU
line2: 1st word is PartCode, 2nd word to next last word are for Description of parts, last numberical word is qty
line3: 2nd last word beginig with the dollar sign is price. the last word is always Add
So here is some hopthetical data
some words 0124578
156234-AA product descriptions till next last word in theline the last word int his line is qty5
a number words from begining of line $472.15 Add
156234Aanother product descriptions till next last word in the line. the last word inthis line is qty48
a number words from beginning of line-the next wrod is price starting with a dollar sign and is always followed by the word Add $472.15 Add
The reason I loathe using number groups is that I am not all that skilled in regex, and I often end up with extra matched groups than I intended
Dot net does have named group expression
. if Java had name expression I woudl have use apttern like this
"^\\w+\\s(<?sku>\\d{7-9})\\s$(<?mfrCd>\\w)\\s(<?desc>(\\w+)\\s(<?avail>(\\d+))$)"+ "\\w+(<?price>(\\$\\d+[.]\\d\\d))\\sAdd"
what I should hav siad was, I did no see tutorial example of using arrays in group matching for java.
I don't see why you need named groups. All you have to do is count the groups! You only have 4!
I don't see why you need "tutorial example of using arrays". You can just use loop using Matcher.find() and extract the 4 values whenever you find a match.
You don't say how the data is presented so more detail is required for further help.
Check this out: import java.util.regex.*;
public class Test
{
static Pattern p = Pattern.compile(
"^.+(\\d{7,9})\\s+^(\\S+)\\s+(.+)\\s+(\\d+)\\s+^.+(\\$\\d+[.]\\d\\d)\\s+Add",
Pattern.MULTILINE);
public static void main(String... args)
{
String str = "some words 0124578\n" +
"156234-A A product descriptions till next last word in the" +
" line the last word int his line is qty 5\n" +
"a number words from begining of line $472.15 Add";
Matcher m = p.matcher(str);
if (m.find())
{
System.out.printf("%nsku: %s%nmfrCd: %s%ndesc: %s%navail: %s%nprice: %s%n",
m.group(1), m.group(2), m.group(3), m.group(4), m.group(5));
}
}
}
By the way, your .NET regex syntax is a little off. Instead of this: (<?sku>\\d{7-9}) you would write this: (?<sku>\\d{7,9})
^ ^
your reply help me a lot. Thank you
Now only if can get it to deal with multiple record. right now it simply get the last one
looks like .+ is rather greedy especially in multi-line mode
Btw: you are right about my .net regx syntax being off.I should looked before writing from memory as I said, my skill for regex is low. I always have to look at an example before I cna do anything with regex.
the input consists of multiple records of multi-line for each records. the input source is in the clipboard.
The extraction result is both displayed in the jtextpane and set in the clipboard for pasting to other application to process.
the format of the result is formateted as following
for each record, the extracted fields are delimited by tab character, one record a line
> looks like .+ is rather greedy especially in
> multi-line mode
?, *, +, and {} are greedy by default, but you can also make them reluctant or possessive. From Pattern's API doc:
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times
Reluctant quantifiers
X? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times
Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times
Details here:
http://java.sun.com/docs/books/tutorial/essential/regex/quant.html
http://www.regular-expressions.info/repeat.html
thank you for the wonderful explanation.
BTW How do I handle alternative matching?
in particular, most of the date record end with
some word before the the price begining with dollar sign $999.99 Add
But there are a few recrods ending wiht something like this instead
some word before the the price begining with dollar sign $999.99
Reg.$1,233.60
Save $123.88
ends on 3/31/07Add
changing (\\$\\d+[.]\\d\\d)\\s+Add
to
(\\$\\d+[.]\\d\\d)\\s+\\S{0,7}Add
does not help
I guess \\S does not match new line
The dot metacharacter normally matches anything but a line separator character, so the ".+" subexpressions in my regex will only match within a single line. MULTILINE mode doesn't change that, it just allows ^ and $ to match line boundaries instead of just the start and end of the input. It's DOTALL that lets the dot match line separators. For example, you can match that extra price information with a reluctant ".*?" in DOTALL mode: "^.+(\\d{7,9})\\s+^(\\S+)\\s+(.+)\\s+(\\d+)\\s+^.+(\\$\\d+[.]\\d\\d(?s:.*?))Add"
The "(?s:xyz)" construct sets the DOTALL flag only while the subexpression within those parens is trying to match.
Thank you. Sorry, I 'm really thick or something. I tried replacing the regex with your suggestion and failed to get match on the record with extra price info. it did match all otehr regualr records.
So (?s:.*?) turns on reluctant moe for .*, right? so it should have gogble up anything between the price and Add in the even the newlines between in match group 6, right?
So did we hit a bug in regex for java version "1.5.0_11"? or simnply the regex reluctant mode is not implemented forthis java version?
> So (?s:.*?) turns on reluctant moe for .*, right?
The (? whatever) makes it a non-capturing group.
The ?s: turns on DOTALL, so the dot can match a line separator.
The .* gobbles zero or more of anything,
The ? after the * makes it reluctant, meaning it will start by matching as little as possible (nothing, here) and then add a character at a time as necessary until the rest of the regex (the "Add") matches. WIthout the reluctant flag (?), it would gobble everything up to the very last occurence of "Add", including intervening "Add"s.
> So did we hit a bug in regex for java version
> "1.5.0_11"? or simnply the regex reluctant mode is
> not implemented forthis java version?
I don't know about the bug you're talking about, but reluctance has been around since regex was added in 1.4, I think.
We'll need to see some of the data you're using the regex on.