regex help
I'm working on a little personal project that requires me to parse a very long string. In the string there is a beginning and ending bracket ie. [ ]. Within this set of brackets, there are multiple sets of brackets separated by commas. Within these sub brackets are strings and ints separated by commas, where the strings are enclosed in quotation marks, and the ints are just there. Here is an example of what I am describing:
some text here then [["string1","string2","string3",1,2,"string4"], "string5","string6","string7",3,4,"string8"], "string1","string2","string3",1,2,"string4"], etc.] and more stuff following that
I'm trying to get the regex syntax to find each subset of brackets, and then once it finds it, pull out each item separated by a comma. Each set has string, string,string,int,int, string, that never changes.
In PCRE it would be something like this, /^[[\d]+\,]{3}+[[0-9]+\,]{2}+[[\d]+\,]$/i, but that isn't totally correct, but I want java expressions instead.
Anyone know more about regex then myself, I'de appreciate the help!
Message was edited by:
neucocoa
[1134 byte] By [
neucocoaa] at [2007-11-26 15:59:26]

Why not use a (simpler?) StringTokenizer?
Something like this:
String text = "[[\"string1\",\"string2\",\"string3\",1,2,\"string4\"], "
+"[\"string5\",\"string6\",\"string7\",3,4,\"string8\"], "
+"[\"string1\",\"string2\",\"string3\",1,2,\"string4\"]";
java.util.StringTokenizer tokens = new java.util.StringTokenizer(text, " ,\"[]");
int counter = 0;
while(tokens.hasMoreTokens()) {
counter++;
System.out.println("counter "+counter+" = "+tokens.nextToken());
}
Thanks prometheuzz, that is exactly what I'm looking for. It works great, but I now see I mispoke when I gave the example in two ways. The poster above you caught it, I forgot the brackets in the original string.
My second mistake was the strings within the quotations. There are more then one word in some of the strings, aka "this is the name of the file". The code you suggested will pick each of those words out, but I want the token to be the data in the quotes in it's entirety. Is this possible using a tokenizer?
Thanks again, i didn't even consider this option before!
Edit: I see that just tokenizing using just a "," rather then the space and brackets, I can get each string literal and ints. The only problem is it leaves the beginning and ending brackets of each subset. I suppose now I just need to replace all the brackets with empty space as well as the qoutes. Any better ideas for this then just replacing after the fact?
Message was edited by:
neucocoa
Update again: I just used multiple replaceAll() methods replacing the things I didn't need. now every token comes out clean! Thanks for the help.
Message was edited by:
neucocoa