StackOverflowError for recursion

I'm trying to use Matcher and Formatter to extract each parenthetical from a mathematical expression, store it in a Stack, and replace it with '@' in the original expression. For example,parentheses("4*(5+2)/(7^6/(9+x*5))")

should render ["4*@/@","5+2","7^6/@","9+x*5"]

Here is the relevant code:privatestaticfinal Pattern p = Pattern.compile([a regex]);

privatestatic Stack<String> parentheses(String expression){

expression = expression.replaceAll("\\[\\{","(").replaceAll("\\]\\}",")").replaceAll("\\s","");

Stack<String> pars =new Stack<String>();

pars.push(expression);

pars.addAll(parenthesesHelper(expression));

return pars;

}

privatestatic Stack<String> parenthesesHelper(String expression){

Stack<String> pars =new Stack<String>();

Matcher m = p.matcher(expression);

StringBuilder sb =new StringBuilder();

Formatter f =new Formatter(sb);

while(m.find()){//line 120

pars.push(f.format("%s", m.group()).toString());

sb.delete(0, sb.length());

pars.addAll(parenthesesHelper(pars.peek()));//line 123

}

return pars;

}

and my error output:

Exception in thread"main" java.lang.StackOverflowError

at java.util.regex.Pattern$Curly.match0(Pattern.java:3644)

at java.util.regex.Pattern$Curly.match(Pattern.java:3628)

at java.util.regex.Pattern$GroupHead.match(Pattern.java:4052)

at java.util.regex.Pattern$Loop.match(Pattern.java:4179)

at java.util.regex.Pattern$GroupTail.match(Pattern.java:4111)

at java.util.regex.Pattern$Curly.match0(Pattern.java:3666)

at java.util.regex.Pattern$Curly.match(Pattern.java:3628)

at java.util.regex.Pattern$CharProperty.match(Pattern.java:3314)

at java.util.regex.Pattern$Curly.match0(Pattern.java:3673)

at java.util.regex.Pattern$Curly.match(Pattern.java:3628)

at java.util.regex.Pattern$GroupTail.match(Pattern.java:4111)

at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3335)

at java.util.regex.Pattern$GroupHead.match(Pattern.java:4052)

at java.util.regex.Pattern$Curly.match0(Pattern.java:3673)

at java.util.regex.Pattern$Curly.match(Pattern.java:3628)

at java.util.regex.Pattern$GroupHead.match(Pattern.java:4052)

at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4195)

at java.util.regex.Pattern$Prolog.match(Pattern.java:4135)

at java.util.regex.Pattern$CharProperty.match(Pattern.java:3314)

at java.util.regex.Pattern$Curly.match0(Pattern.java:3673)

at java.util.regex.Pattern$Curly.match(Pattern.java:3628)

at java.util.regex.Pattern$Curly.match0(Pattern.java:3673)

at java.util.regex.Pattern$Curly.match(Pattern.java:3628)

at java.util.regex.Pattern$GroupHead.match(Pattern.java:4052)

at java.util.regex.Pattern$Start.match(Pattern.java:3024)

at java.util.regex.Matcher.search(Matcher.java:1105)

at java.util.regex.Matcher.find(Matcher.java:535)

at Equation.parenthesesHelper(Equation.java:120)

at Equation.parenthesesHelper(Equation.java:123)

[the previous line repeated several times]

I should probably convert the recursion to a while loop, but could use some help with that.

Message was edited by:

Jesdisciple

[4382 byte] By [Jesdisciplea] at [2007-11-27 5:17:41]
# 1
Look at all the calls to push() and how many calls to pop() you have... Overall, this does seem to be a textbook case for recursion, so I don't recommend redesigning the recursion out of it.Message was edited by: kevjava [This is rubbish. Ignore me.]
kevjavaa at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 2
? At first, you seemed to support getting rid of the recursion in favor of a loop, but then you say it's good. What do you suggest?EDIT: I just realized that I don't have any '@'s coded.Message was edited by: Jesdisciple
Jesdisciplea at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 3

Contrary to what you said at the end of [url=http://forum.java.sun.com/thread.jspa?threadID=5175814]your other thread[/url], it was the regex that was causing the stack overflow. Try this out: import java.util.*;

import java.util.regex.*;

public class Test

{

static final Pattern p = Pattern.compile("\\(([^()]*+)\\)(?=[^(]*+$)");

private static Stack<String> parentheses(String expression)

{

Stack<String> pars = new Stack<String>();

Matcher m = p.matcher(expression);

while (m.find())

{

pars.push(m.group(1));

m.reset(expression = m.replaceFirst("@"));

}

pars.push(expression);

return pars;

}

public static void main(String... args)

{

System.out.println(parentheses("4*(5+2)/(7^6/(9+x*5))"));

}

}

uncle_alicea at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 4

> Contrary to what you said at the end of

> [url=http://forum.java.sun.com/thread.jspa?threadID=51

> 75814]your other thread[/url], it was the

> regex that was causing the stack overflow. Try this

> out: ...

Nice work, uncle_alice!

Thank god I did not post my attempt of this problem: I would've made an ass out of myself!

; )

prometheuzza at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 5
> Try this out: Wow, that rocks. I think I might almost understand that block of code. The regex is voodoo, though. I'm going to have to go through with a fine-tooth comb to try and grok that one :).Thank you kindly, uncle_alice.
kevjavaa at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 6
> The regex is voodoo, though.This was one of those "Wait a minute, will that really work?" moments. Explaining why it works, now, that could be difficult. I'd better get another cup of tea first. ^_^
uncle_alicea at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 7

Okay, the regex itself is actually pretty simple. The first part, \([^()]*+\)

matches a set of parentheses that don't contain any other parens. The second part, (?=[^(]*+$)

is a lookahead that ensures that none the characters between the closing paren and the end of the string are opening parens. Closing parens are okay, because the subexpression we just matched may be nested within another set of parens. What we want is to match the last, most deeply nested subexpression.

Each time through the loop, the content between the parens is pushed onto the stack, then the subexpression (including the parens) is replaced with "@", and the Matcher is reset to work with the altered string. As nested subexpressions are replaced, it becomes possible for the regex to match the subexpressions that contained them. When the regex fails to match, we push the final, altered string onto the stack.

I want to make it clear that the basic idea for this trick was the OP's, not mine. I just came up with the regex and simplified the implementation based on my knowledge of the regex API. I never thought of this recursive-replacement approach to parsing. I still think regexes are the wrong tool for parsing mathematical expressions, but who cares? This is totally cool! ^_^

uncle_alicea at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 8

> \([^()]*+\)

Okay, the "*+" threw me... I came from Perl regex's, where the '*' alone will work. That is pretty simple, yet pretty friggin' elegant :).

> I never thought of this recursive-replacement approach

> to parsing. I still think regexes are the wrong tool for

> parsing mathematical expressions, but who cares?

> This is totally cool! ^_^

Dude, I agree. You can say it's the wrong tool, but I think it works pretty well. Sure beats StringTokenizer or a simple expr.indexOf('(') :). Thanks for explaining it.

kevjavaa at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 9

Okay, I'm now on eval(String), which solves each Stack element and returns the ultimate solution. It claims I'm giving String.substring(int, int) an index of -3, which the output says I'm not. By the output, I'm giving a String of length 4 an index of 7, but not on the line which the error complains about. Here is the method:public static double eval(String expression){

expression = expression.replaceAll("\\s", "");

System.out.println(expression);

double before, after;

char oper;

int x1, x2;

StringBuilder sb = new StringBuilder();

Formatter f = new Formatter(sb);

Stack<String> s = parentheses(expression);

Matcher m;

Stack<Double> at = new Stack<Double>();

for(int i1 = 0; i1 < s.size(); i1++){

for(int i2 = 0; i2 < operators.length; i2++){

m = Pattern.compile(numbers + operators[i2] + numbers).matcher(s.get(i1));

while(m.find()) {

System.out.println("<s = " + s);

int i3 = 0;

sb.delete(0, sb.length());

String temp = m.group(++i3);

before = temp.equals("@") ? at.pop()

: Double.parseDouble(f.format("%f", Double.parseDouble(temp)).toString());

System.out.println("before = " + before);

sb.delete(0, sb.length());

temp = m.group(++i3);

oper = f.format("%c", temp.charAt(0)).toString().charAt(0);

System.out.println("oper = " + oper);

sb.delete(0, sb.length());

temp = m.group(++i3);

after = temp.equals("@") ? at.pop()

: Double.parseDouble(f.format("%f", Double.parseDouble(temp)).toString());

System.out.println("after = " + after);

sb.delete(0, sb.length());

x1 = Integer.parseInt(f.format("%d", m.start()).toString());

System.out.println("start = " + x1);

sb.delete(0, sb.length());

x2 = Integer.parseInt(f.format("%d", m.end()).toString());

System.out.println("end = " + x2);

temp = s.get(i1);

System.out.println(">s = " + s);

m.reset(s.set(i1, temp.substring(0, x1) //line 66

+ solve(before, oper, after) + temp.substring(x2)));

}

}

at.push(Double.parseDouble(s.get(i1)));

System.out.println("at = " + at);

}

System.out.println("at = " + at);

return at.pop();

}

and the bug-checking and error output:4*(5+2)/(7^6/(9+3*5))

<s = [9+3*5, 7^6/@, 5+2, 4*@/@]

before = 3.0

oper = *

after = 5.0

start = 2

end = 5

>s = [9+3*5, 7^6/@, 5+2, 4*@/@]

<s = [9+15.0, 7^6/@, 5+2, 4*@/@]

before = 3.0

oper = *

after = 5.0

start = 2

end = 5

>s = [9+15.0, 7^6/@, 5+2, 4*@/@]

<s = [9+15.00, 7^6/@, 5+2, 4*@/@]

before = 9.0

oper = +

after = 15.0

start = 0

end = 7

>s = [9+15.00, 7^6/@, 5+2, 4*@/@]

<s = [24.0, 7^6/@, 5+2, 4*@/@]

before = 9.0

oper = +

after = 15.0

start = 0

end = 7

>s = [24.0, 7^6/@, 5+2, 4*@/@]

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -3

at java.lang.String.substring(String.java:1938)

at java.lang.String.substring(String.java:1905)

at Equation.eval(Equation.java:66)

at Equation.main(Equation.java:140)

It seems to run over each simple expression twice, but I don't understand why.

What algorithm do you consider to be ideal for math parsing, uncle_alice?

Message was edited by:

Jesdisciple

Jesdisciplea at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 10

> Okay, the "*+" threw me... I came from Perl regex's,

> where the '*' alone will work.

It'll work here, too. The '+' is just an optimization; it says "match as many as you can, and if the next part of the regex can't match, don't bother backtracking, because I already know it won't do any good." That way, if the overall regex can't match, it will fail much more quickly than it otherwise might.

> That is pretty simple, yet pretty friggin' elegant :).

^_^

> Dude, I agree. You can say it's the wrong tool, but I

> think it works pretty well.

The biggest problem with building parsers out of regexes is that they don't deal well with failure. Detecting malformed input and reporting it in a way that's useful is at least as important as correctly parsing good input. In this case, we know that if the final expression contains any parentheses, the input was malformed, but how? And where? You would have to do a separate pass over the original input to find that out, and there goes elegance, out the window. :-(

uncle_alicea at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 11
Umm... I think you must have not seen my post (at the bottom of the previous page).
Jesdisciplea at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 12

> It'll work here, too. The '+' is just an

> optimization; it says "match as many as you can, and

> if the next part of the regex can't match, don't

> bother backtracking, because I already know it won't

> do any good." That way, if the overall regex can't

> match, it will fail much more quickly than it

> otherwise might.

I get it. Had to go to the old documentation... from [url=http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html]the javadocs[/url]:

Constructs supported by this class but not by Perl:

*Possessive quantifiers, which greedily match as much as they can and do not back off, even when doing so would allow the overall match to succeed.

Your explanation makes more sense to me.

> The biggest problem with building parsers out of

> regexes is that they don't deal well with failure.

> Detecting malformed input and reporting it in a way

> that's useful is at least as important as correctly

> parsing good input.

<sarcasmmode="on"humorous_voice="chris_farley"quote_fingers="profuse">

All'a'you professionals have to worry about the so-called 'error path'... I'd be perfectly happy if my compiler came back with:

Test.java: valid java file expected

1 error

When I was a young programmer (living in a van down by the river), our compiler didn't show you pretty error messages when you borked the syntax, it just seg-faulted. Figure it out yourself, you whiny wanna-be-programming prima donna!

</sarcasm>

kevjavaa at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 13

> Umm... I think you must have not seen my post

Yes, I was replying to kevjava. Here's yours:

@Jesdisciple, when I was coming up with my previous suggestion, I went almost entirely by the sample input and output you provided. Your basic idea was good, but your regex and program logic were so overly-complicated, there was no possibility of working from them to a solution. The same is true here, I'm afraid. All I can say is try again, this time without the StringBuilder and the Formmatter (as far as I can tell, you're using them for parsing, not output, which is a little mind-blowing). It would help if I could see what "numbers" and "operators" refer to, as well.

uncle_alicea at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 14

I developed the algorithm from Sun's RegexTestHarness.java (http://java.sun.com/docs/books/tutorial/essential/regex/test_harness.html). String numbers is the pattern for numbers/'@' and String[] operators is an array of operator patterns to keep track of the order of operations.private static final String numbers = "((?:\\-?[0-9]+\\.?[0-9]*)|@)";

private static final String[] operators = {"(\\^)", "([\\*/%])", "([\\+\\-_])"};

I'm using Formatter to retrieve the two doubles to be operated on, the operator telling how, and the String indices between which to insert the solution, and StringBuilder to make sure Formatter is empty before each time I use it.

Message was edited by:

Jesdisciple

Message was edited by:

Jesdisciple

Jesdisciplea at 2007-7-12 10:40:34 > top of Java-index,Java Essentials,Java Programming...
# 15

Double.parseDouble(f.format("%f", Double.parseDouble(temp)).toString())

That parses the String value in 'temp' into a double, converts it back to a character sequence in the form of a StringBuilder, converts that to a String again, which is then parsed (again) it into a double. Am I missing something? Why isn't Double.parseDouble(temp)

sufficient? All you're doing with Formatter and StringBuilder stuff is obfuscating your code.

uncle_alicea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 16

Lol, oops. While I was fiddling with my code, I made another problem.

EDIT: To clarify, I edited Formatter out.4*(5+2)/(7^6/(9+3*5))

m = java.util.regex.Matcher[pattern=((?:[\-\+]?[0-9]+\.?[0-9]*)|@)(\^)((?:[\-\+]?[0-9]+\.?[0-9]*)|@) region=0,5 lastmatch=]

s = [9+3*5, 7^6/@, 5+2, 4*@/@]

m = java.util.regex.Matcher[pattern=((?:[\-\+]?[0-9]+\.?[0-9]*)|@)([\*/%])((?:[\-\+]?[0-9]+\.?[0-9]*)|@) region=0,5 lastmatch=]

s = [9+3*5, 7^6/@, 5+2, 4*@/@]

<s = [9+3*5, 7^6/@, 5+2, 4*@/@]

before = 3.0

oper = *

after = 5.0

start = 1

end = 5

>s = [9+3*5, 7^6/@, 5+2, 4*@/@]

<s = [915.0, 7^6/@, 5+2, 4*@/@]

before = 3.0

oper = *

after = 5.0

start = 1

end = 5

>s = [915.0, 7^6/@, 5+2, 4*@/@]

m = java.util.regex.Matcher[pattern=((?:[\-\+]?[0-9]+\.?[0-9]*)|@)([\+\-_])((?:[\-\+]?[0-9]+\.?[0-9]*)|@) region=0,5 lastmatch=]

s = [915.0, 7^6/@, 5+2, 4*@/@]

at = [915.0]

m = java.util.regex.Matcher[pattern=((?:[\-\+]?[0-9]+\.?[0-9]*)|@)(\^)((?:[\-\+]?[0-9]+\.?[0-9]*)|@) region=0,5 lastmatch=]

s = [915.0, 7^6/@, 5+2, 4*@/@]

<s = [915.0, 7^6/@, 5+2, 4*@/@]

before = 7.0

oper = ^

after = 6.0

start = 0

end = 3

>s = [915.0, 7^6/@, 5+2, 4*@/@]

<s = [915.0, 117649.0/@, 5+2, 4*@/@]

before = 7.0

oper = ^

after = 6.0

start = 0

end = 3

>s = [915.0, 117649.0/@, 5+2, 4*@/@]

m = java.util.regex.Matcher[pattern=((?:[\-\+]?[0-9]+\.?[0-9]*)|@)([\*/%])((?:[\-\+]?[0-9]+\.?[0-9]*)|@) region=0,15 lastmatch=]

s = [915.0, 117649.0649.0/@, 5+2, 4*@/@]

<s = [915.0, 117649.0649.0/@, 5+2, 4*@/@]

before = 649.0

oper = /

after = 915.0

start = 7

end = 15

>s = [915.0, 117649.0649.0/@, 5+2, 4*@/@]

<s = [915.0, 117649.0.7092896174863388, 5+2, 4*@/@]

before = 649.0

oper = /

Exception in thread "main" java.util.EmptyStackException

at java.util.Stack.peek(Stack.java:85)

at java.util.Stack.pop(Stack.java:67)

at Equation.eval(Equation.java:47)

at Equation.main(Equation.java:132)

"9+3*5" turns into "9" + "15"; "+" is somehow forgotten.

public static double eval(String expression){

expression = expression.replaceAll("\\s", "");

System.out.println(expression);

double before, after;

String temp;

int x1, x2;

char oper;

Stack><String> s = parentheses(expression);

Matcher m;

Stack<Double> at = new Stack<Double>();

for(int i1 = 0; i1 < s.size(); i1++){

for(int i2 = 0; i2 < operators.length; i2++){

m = Pattern.compile(numbers + operators[i2] + numbers).matcher(s.get(i1));

System.out.println("m = " + m);

System.out.println("s = " + s);

while(m.find()) {

System.out.println("<s = " + s);

int i3 = 0;

temp = m.group(++i3);

before = temp.equals("@") ? at.pop() : Double.parseDouble(temp);

System.out.println("before = " + before);

temp = m.group(++i3);

oper = temp.charAt(0);

System.out.println("oper = " + oper);

temp = m.group(++i3);

after = temp.equals("@") ? at.pop() : Double.parseDouble(temp); //line 47

System.out.println("after = " + after);

x1 = m.start();

System.out.println("start = " + x1);

x2 = m.end();

System.out.println("end = " + x2);

temp = s.get(i1);

System.out.println(">s = " + s);

m.reset(s.set(i1, temp.substring(0, x1) //line 66

+ solve(before, oper, after) + temp.substring(x2)));

}

}

at.push(Double.parseDouble(s.get(i1)));

System.out.println("at = " + at);

}

return at.pop();

}

null

Jesdisciplea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 17

> *Possessive quantifiers, which greedily match

> as much as they can and do not back off, even

> when doing so would allow the overall match to

> succeed.

> Your explanation makes more sense to me.

Yeah, it took me a while to get my head around possessive quantifiers (and the related concept, atomic groups) thanks in part to descriptions like that one. Why do I need a special construct to help me create regexes that don't work? I can do that on my own, thank you very much!

But the point is, if the match attempt is going to fail anyway, we want it to do so as quickly as possible. When your computer locks up because a regex is taking hours or millenia to do its work, it's usually not going to match anyway. But even when the match succeeds, it can take a ridiculously long time to do so, because some part of the regex has to fail many times along the way.

Here's a better treatment of the fail-fast facilitators:

http://www.regular-expressions.info/atomic.html

uncle_alicea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 18

UPDATE: I found my mistake; I had allowed '+' as well as '-' before a number. Now I have this:3*5

3*5

9+15.0

9+15.0

7^6

7^6

117649.0/@

117649.0/@

Exception in thread "main" java.util.EmptyStackException

at java.util.Stack.peek(Stack.java:85)

at java.util.Stack.pop(Stack.java:67)

at Equation.eval(Equation.java:30)

at Equation.main(Equation.java:105)

With this not-so-convoluted code:public static double eval(String expression){

expression = expression.replaceAll("\\s", "");

Stack<String> s = parentheses(expression);

Stack<Double> at = new Stack<Double>();

for(int i1 = 0; i1 < s.size(); i1++){

for(int i2 = 0; i2 < operators.length; i2++){

Matcher m = Pattern.compile(numbers + operators[i2] + numbers).matcher(s.get(i1));

while(m.find()) {

System.out.println(m.group());

double before = m.group(1).equals("@") ? at.pop() : Double.parseDouble(m.group(1)),

after = m.group(3).equals("@") ? at.pop() : Double.parseDouble(m.group(3));

char oper = m.group(2).charAt(0);

m.reset(s.set(i1, m.replaceFirst("" + solve(before, oper, after))));

}

}

at.push(Double.parseDouble(s.get(i1)));

}

return at.pop();

}

The cause seems to be that m.find() isn't re-evaluated for each iteration of while(m.find()).

Message was edited by:

Jesdisciple

Jesdisciplea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 19

That code is still too convoluted for my taste. It's much easier to tokenize the subexpression strings completely, then go back and process the tokens (that's how parsers usually work). You can use a while(find()) loop for that, but I wanted to try using split(). I don't know why, but the idea of saving four lines of code seems irresistably appealing to me today. ^_^ The code below only takes us up to the tokenizing step. import java.util.*;

import java.util.regex.*;

public class Test

{

static final Pattern parensPattern = Pattern.compile(

"\\(([^()]*+)\\)(?=[^(]*+$)"

);

static final Pattern exprSplitter = Pattern.compile(

"(?<!^|[*/%^+-])(?=[*/%^+-])|(?<!(?:^|[*/%^+-])-)(?<=[*/%^+-])"

);

public static double eval(String expression)

{

Stack<String> strs = parentheses(expression);

for (String str : strs)

{

System.out.printf("%n%s%n", str);

String[] tokens = exprSplitter.split(str);

System.out.println(Arrays.toString(tokens));

}

return 0.0;

}

private static Stack<String> parentheses(String expression)

{

Stack<String> pars = new Stack<String>();

Matcher m = parensPattern.matcher(expression);

while (m.find())

{

pars.add(m.group(1));

m.reset(expression = m.replaceFirst("@"));

}

pars.add(expression);

return pars;

}

public static void main(String... args)

{

eval("-4*(5+2)/(7^-6/(9+3*5))");

}

}

The reason the regex is so hideous is because it has to match before and after the operators (not the operators themselves), but not treat the minus signs in front of negative numbers as operators. In case you were wondering, yes, a while(find()) aproach would require a much simpler regex. ^_^

uncle_alicea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 20
> "(?<!^|[*/%^+-])(?=[*/%^+-])|(?<!(?:^|[*/%^+-])-)(?<=[*/%^+-])"I bow to your obviously superior reg-ex kung-fu.That looks like somebody already compiled it once.>
kevjavaa at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 21
Only known photo of uncle_alice:[url #" style="display: block; background-image: url(' http://www.kungfu-guide.com/img_wb/462001111510823po.jpg'); width: 225px; height: 241px] [/url]
Hippolytea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 22
Always remember this, Grasshopper: even if you shave your head, you can still have bad-hair days.
uncle_alicea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 23

I've decomposed and defined the regex.

(?<!^|[*/%^+-])

^|[*/%^+-] via zero-width negative lookbehind

(?=[*/%^+-])|(?<!(?:^|[*/%^+-])-)|(?<!(?:^|[*/%^+-])-)

[*/%^+-] via zero-width positive lookahead, OR ([*/%^+-] as a non-capturing group and -, via zero-width negative lookbehind)

(?><=[*/%^+-])

[*/%^+-] via zero-width positive lookbehind

Now I need to know what those definitions mean; the API doesn't explain further.

Jesdisciplea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 24

If we didn't have to deal with negative numbers, we could use this: (?=[*/%^+-])|(?<=[*/%^+-]) That just means, either the next character is an operator, or the previous character is. In fact, both conditions could be true at the same position if the position is between two operators. The only time that will happen (assuming the input is well formed) is when an operator is followed by a negative number. In that case, we want to match before the minus sign, but not after it. We also have to deal with the case where the first token is a negative number, which means the first character in the string is a minus sign, and we don't want to match either before or after it. So, taking the first alternative, where the operator is the next character, we need to impose the additional restrictions that it can't be the first character in the string and it can't be preceded by another operator: (?<!^|[*/%^+-])(?=[*/%^+-]) In the second alternative, we're looking at the previous character; we use one lookbehind to make sure it's an operator, and another one to ensure that it's not the first character in the string and that it isn't preceded by another operator: (?<!(?:^|[*/%^+-])-)(?<=[*/%^+-]) Note that it doesn't matter what order you list the alternatives in, or the individual lookarounds within each alternative, since none of them advance the match position. So this means exactly the same as the original: (?<=[*/%^+-])(?<!(?:^|[*/%^+-])-)|(?=[*/%^+-])(?<!^|[*/%^+-])

uncle_alicea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 25

I'm gonna have to get Special Constructs 101, except that I understand "non-capturing". What is a flag? Exactly what does a look-ahead/-behind do, and what difference does positive/negative make? I'm guessing "zero-width" means not advancing the position. What does it mean for a group to be "independent"?

EDIT: I think I got the answer to the second question, "Exactly what does a look-ahead/-behind do, and what difference does positive/negative make?". Here is my concept of this, in HTML code:<html>

<head>

<title>Special Constructs 101</title>

</head>

<body>

<table width="100%">

<tr>

<td>"Exactly what does a look-ahead/-behind do, and what difference does it make for it to be positive/negative?"</td>

</tr>

<tr>

<td align="center">

<table border="1">

<thead>

<th>A group which begins with</th>

<th>will ensure that there</th>

<th>a match for the following pattern</th>

<th>or force the match to fail, so it's called a</th>

</thead>

<tr>

<td align="center">?=</td>

<td align="center">is</td>

<td align="center">ahead of the group</td>

<td align="center">positive lookahead.</td>

</tr>

<tr>

<td align="center">?!</td>

<td align="center">is not</td>

<td align="center">ahead of the group</td>

<td align="center">negative lookahead.</td>

</tr>

<tr>

<td align="center">?<=</td>

<td align="center">is</td>

<td align="center">behind the group</td>

<td align="center">positive lookbehind.</td>

</tr>

<tr>

<td align="center">?<!</td>

<td align="center">is not</td>

<td align="center">behind the group</td>

<td align="center">negative lookbehind</td>

</tr>

</table>

</td>

</tr>

</table>

</body>

</html>

Message was edited by:

Jesdisciple

Jesdisciplea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 26
> Here is my concept of this, in HTMLWhy?
cotton.ma at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 27
to make sure my concept is correct
Jesdisciplea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 28
> to make sure my concept is correctI understand why you would want to ask. I don't understand why youposted HTML code with the concept.
cotton.ma at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 29
A table seemed the easiest way to illustrate it, and I don't think these forums are capable of that.
Jesdisciplea at 2007-7-21 21:24:13 > top of Java-index,Java Essentials,Java Programming...
# 30
An independent group is the same as an atomic group; check the link I posted in reply #17. That whole tutorial is good, and when it's not enough any more, there's [url]The Book[/url].
uncle_alicea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 31
OK, I get that part now. What does "^|" (caret and bar, occurring twice in the regex) mean?Message was edited by: Jesdisciple
Jesdisciplea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 32
'^' start-of-input/start-of-line anchor, so ^|[*/%^+-] matches the beginning of the input OR one of those six punctuation characters.
uncle_alicea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 33
And why do you nest that in a second non-capturing group, here? (The lookbehind is already non-capturing.)(?<!(?:^|[*/%^+-])-)>
Jesdisciplea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 34

Because that lookbehind has to match the minus sign, too. Leaving out the inner parens would change the meaning from ((start of input OR operator char) FOLLOWED BY minus sign) to (start of input OR (operator char FOLLOWED BY minus sign)) Of course, I could have distributed the minus sign instead: (?<!^-|[*/%^+-]-)

Or are you asking why I used non-capturing parens instead of capturing? Capturing groups always capture, even when they're used inside lookarounds. In fact, I could have written the second alternative like this: (?<=([*/%^+-]))(?<!(?:^|[*/%^+-])\1) If the first lookbehind succeeds, it stores the matched character in group #1. Then the \1 in the second lookbehind will try to match the same character. Not a useful trick in this case, but it can be.

uncle_alicea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 35

(?<=([*/%^+-]))(?<!(?:^|[*/%^+-])\1)

><the future circa="3050 AD">

Many mysteries surround the earliest internet pioneers including their use of this still undeciphered language. Some paleo-linguists have suggested that this entry relates to as of yet unknown astrological calendar while others insist it's a encoded pictogram of a strawberry muffin receipe.

The truth may never be known.

cotton.ma at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 36
> Not a useful trick in this case, but it can be....for instance, in this situation: http://forum.java.sun.com/thread.jspa?threadID=5176659^_^
uncle_alicea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 37

And finally, what's a flag?

Now back to the code. I can't figure out how to manipulate the tokens in eval(String) so the correct number is returned.Here's my code, complete with confusion:public static double eval(String expression) {

Stack<String> strs = parentheses(expression),

temp = new Stack<String>();

System.out.println(strs);

Stack<Double> at = new Stack<Double>();

for (String str : strs) {

temp.removeAllElements();

System.out.printf("%n%s%n", str);

String[] tokens = exprSplitter.split(str);

for(String s : tokens) System.out.println(temp.push(s));

String hold;

assert tokens.length % 2 == 1 && tokens.length != 1 : "Illegal # of tokens: " + tokens.length + ".";

while(temp.size() > 3){

temp.set(0, "" + solve(

(hold = temp.get(0)) == "@" ? at.pop() : Double.parseDouble(hold),

temp.pop().charAt(0),

(hold = temp.remove(2)) == "@" ? at.pop() : Double.parseDouble(hold)

));

}

}

return at.pop();

}

Message was edited by:

Jesdisciple

Jesdisciplea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 38

> And finally, what's a flag?

A flag is a modifier: http://www.regular-expressions.info/modifiers.html

As for your code, you're still making it more complicated than it has to be. There's no need to create all those stacks, much less empty and refill them repeatedly. Below, I've refactored my previous code to create just the one stack, of tokenized subexpressions. After that, recursion is your friend.

I also switched to using Deque instead of Stack, but only because Deque is an improved, drop-in replacement for Stack. While I don't think the Stack class is causing any of your problems, I'm very uncomfortable with the way you use it as a Stack sometimes and as a Vector other times. Of course, you can abuse a Deque just as badly, but at least you have to use Deque semantics to do it. ^_^

import java.util.*;

import java.util.regex.*;

public class Test

{

static final Pattern parensPattern = Pattern.compile(

"\\(([^()]*+)\\)(?=[^(]*+$)"

);

static final Pattern exprSplitter = Pattern.compile(

"(?<!^|[*/%^+-])(?=[*/%^+-])|(?<!(?:^|[*/%^+-])-)(?<=[*/%^+-])"

);

public static double eval(String expression)

{

Deque<String[]> pars = parentheses(expression);

return solveRecursive(pars);

}

private static Deque<String[]> parentheses(String expression)

{

Deque<String[]> pars = new ArrayDeque<String[]>();

Matcher m = parensPattern.matcher(expression);

while (m.find())

{

pars.push(exprSplitter.split(m.group(1)));

m.reset(expression = m.replaceFirst("@"));

}

pars.push(exprSplitter.split(expression));

return pars;

}

private static double solveRecursive(Deque<String[]> pars)

{

String[] tokens = pars.pop();

double a = "@".equals(tokens[0])

? solveRecursive(pars)

: Double.parseDouble(tokens[0]);

for (int i = 1; i < tokens.length; i+= 2)

{

char op = tokens[i].charAt(0);

double b = "@".equals(tokens[i+1])

? solveRecursive(pars)

: Double.parseDouble(tokens[i+1]);

a = solveSimple(a, op, b);

}

return a;

}

private static double solveSimple(double a, char op, double b)

{

double result = 0.0;

switch (op)

{

case '+' : result = a + b; break;

case '-' : result = a - b; break;

case '*' : result = a * b; break;

case '/' : result = a / b; break;

case '^' : result = Math.pow(a, b); break;

default : assert false : "Invalid operator";

}

System.out.printf("%n%3.2f %c %3.2f = %3.2f%n", a, op, b, result);

return result;

}

public static void main(String... args)

{

String expr = "4*(5^2)/(8*5/(8-6*5))";

System.out.println(expr);

System.out.printf("%nresult: %s", eval(expr));

}

}

uncle_alicea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 39
uncle_alice That's a seriously nastyassed little regex. Would you mind if I where to pray to you occasionally :-)Thanx for the explanation. I appreciate it.Cheers. Keith
corlettka at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 40

> An independent group is the same as an atomic group;

> check the link I posted in reply #17. That whole

> tutorial is good, and when it's not enough any more,

> there's [url]The Book[/url].

Ummm... which book? Might it be O'Reilleys Owl book - "Mastering regular expressions"

corlettka at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 41
> Ummm... which book? Might it be O'Reilleys Owl book -> "Mastering regular expressions"Yup, the one from Jeffrey Friedl: http://regex.info/
prometheuzza at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 42

thanx prometheuzz.

Guys, Can we make it handle operator precedence correctly?

5.00 ^ 2.00 = 25.00

4.00 * 25.00 = 100.00

8.00 * 5.00 = 40.00

8.00 - 6.00 = 2.00

2.00 * 5.00 = 10.00

40.00 / 10.00 = 4.00

100.00 / 4.00 = 25.00

4*(5^2)/(8*5/(8-6*5)) = 25.0

Observing standard operator precedence yields a result of -55, because (8-6*5) would resolve to 8-30, but this algorithm does (2*5).

I can't think of a simple way to do it.

corlettka at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 43

Yeah, the operator precedence is arbitrary and there's no error handling or sanity checking. I only got into this thread in the first place because of the intriguing regex aspects. Turning this code into a robust mathematical expression parser will take a lot more work, and I don't see how regexes can be of any further use. If you (OP or anyone) want more help with the non-regex aspects, I suggest you start a new thread with question(s) focused on the specific problem(s) you're having.

uncle_alicea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 44

> > but could use some help with that.

>

> Only if it's homework. If this is job related I suggest you contact a

> consulting firm.

I guess it's self-assigned homework. Unfortunately, I don't have a job yet.

Also, I'm still wrestling with regexes, but not in regard to their syntax.

> Guys, Can we make it handle operator precedence

> correctly?static final String[] operators = {"(\\^)", "([\\*/%])", "([\\+\\-_])"};

public static double eval(String) {

...

for(String s : operators){

//handle each step of the order of operations

}

...

}

I'm having trouble implementing this in uncle_alice's Deque code, though.

import java.util.*;

import java.util.regex.*;

public class Test

{

static final Pattern parensPattern = Pattern.compile(

"\\(([^()]*+)\\)(?=[^(]*+$)"

);

private static final String[] operators = {"(\\^)", "([\\*/%])", "([\\+\\-_])"};

//static final Pattern exprSplitter = Pattern.compile(

//"(?<!^|[*/%^+-])(?=[*/%^+-])|(?<!(?:^|[*/%^+-])-)(?<=[*/%^+-])"

//);

public static double eval(String expression)

{

Deque><String[]> pars = parentheses(expression);

return solveRecursive(pars);

}

private static Deque<String[]> parentheses(String expression)

{

Deque<String[]> pars = new ArrayDeque<String[]>();

Matcher m = parensPattern.matcher(expression);

while (m.find())

{

for(String s : operators)

{

Pattern exprSplitter = Pattern.compile(

"(?<!^|" + s + ")(?=" + s + ")|(?><!(?:^|" + s + ")-)(?><=" + s + ")"

);

pars.push(exprSplitter.split(m.group(1))); //line 34

m.reset(expression = m.replaceFirst("@"));

}

}

for(String s : operators)

{

Pattern exprSplitter = Pattern.compile(

"(?<!^|" + s + ")(?=" + s + ")|(?><!(?:^|" + s + ")-)(?><=" + s + ")"

);

pars.push(exprSplitter.split(expression));

}

return pars;

}

private static double solveRecursive(Deque<String[]> pars)

{

String[] tokens = pars.pop();

double a = "@".equals(tokens[0])

? solveRecursive(pars)

: Double.parseDouble(tokens[0]);

for (int i = 1; i < tokens.length; i+= 2)

{

char op = tokens[i].charAt(0);

double b = "@".equals(tokens[i+1])

? solveRecursive(pars)

: Double.parseDouble(tokens[i+1]);

a = solveSimple(a, op, b);

}

return a;

}

private static double solveSimple(double a, char op, double b)

{

double result = 0.0;

switch (op)

{

case '+' : result = a + b; break;

case '-' : result = a - b; break;

case '*' : result = a * b; break;

case '/' : result = a / b; break;

case '^' : result = Math.pow(a, b); break;

default : assert false : "Invalid operator";

}

System.out.printf("%n%3.2f %c %3.2f = %3.2f%n", a, op, b, result);

return result;

}

public static void main(String... args)

{

String expr = "4*(5^2)/(8*5/(8-6*5))";

System.out.println(expr);

System.out.printf("%nresult: %s", eval(expr));

}

}

4*(5^2)/(8*5/(8-6*5))

Exception in thread "main" java.lang.IllegalStateException: No match found

at java.util.regex.Matcher.group(Matcher.java:468)

at Test.parentheses(Test.java:34)

at Test.eval(Test.java:18)

at Test.main(Test.java:86)

Process completed.

Jesdisciplea at 2007-7-21 21:24:19 > top of Java-index,Java Essentials,Java Programming...
# 45

The way this code is organized, you have to tokenize each subexpression completely before you can evaluate any of the tokens. If you try to split on a subset of the set of operators, you'll end up with garbage. What you need to do is replace my solveRecursive() method (which is just a placeholder anyway) with something that enforces operator precedence rules. It should just be a matter of examining all the operators in the array beforehand, then evaluating the simple expressions in order of precedence instead of from left to right as I did.

uncle_alicea at 2007-7-21 21:24:24 > top of Java-index,Java Essentials,Java Programming...
# 46

For the Order of Operations, I've made a method to be operated on the output of parentheses(String) (which I've changed to a Vector<String> because the get(int) and set(int, String) methods are necessary and a String seems more workable to me than a String[]).When I test the regex on an expression in RegexTestHarness, it works fine, but m.find() never returns true in practice.private static final String numbers = "(?:\\-?\\d++\\.?\\d*+)",

terms = "(?:(?:@|" + numbers + "|\\p{Lower})!?+)";

private static final String[] operators = {"\\^", "\\*/%", "\\+-_"};

public static Vector<String> order(Vector<String> pars){

Matcher m = Pattern.compile("").matcher("");

for(int i1 = 0; i1 < pars.size(); i1++){

m.reset(pars.get(i1));

for(int i2 = 0; i2 < 2; i2++){

String lower = "",

main = "(" + terms + "[" + operators[i2] + "]" + terms + ")";

for(int i3 = 2; i3 > i2; i3--) lower += operators[i3];

m.usePattern(Pattern.compile("[" + lower + "].*" + main + "|" + main + ".*[" + lower + "]"));

while(m.find()) m.reset(pars.set(i1, m.replaceFirst("(" + (m.group(1) == null ? m.group(2) : m.group(1)) + ")")));

Vector<String> temp = parentheses(pars.remove(i1));

for(int i3 = 0; i3 < temp.size(); i3++) pars.add(i1 + i3, temp.get(i3));

}

}

return pars;

}

"[" + lower + "].*" and ".*[" + lower + "]" require that some lower operation than the level being ordered be present in the current String element; otherwise, the parentheses are unnecessary. I suppose I only need "[" + lower + "].*" (as the left-to-right order works fine otherwise), but would like verification of this before I delete the regex's second case.

Jesdisciplea at 2007-7-21 21:24:24 > top of Java-index,Java Essentials,Java Programming...