CSV parsing joy
Hi there,
Was wondering if there was an elegant solution to the following requirement.
I have data arrriving in one big string, with the following format:
"abc""def""ghi","pqr""stu""vwx"
That is
* Each 'row' of data is separated with commas
* Each row has the same number of individual values
* Each value is separated with a single tab character
* Each value is wrapped in double quotes
The simplest approach would be to use split like:
for (String row : data.split(",")){
for (String value : row.split("\\t")){
// etc..
}
}
But, a value can contain a comma. So I have had to resort to:
for (String row : data.split(getRowSplitRegex()){
// etc..
}
private String getRowSplitRegex(){
String positiveLookBehindForQuote ="(?<=\")";
String positiveLookAheadForQuote ="(?=\")";
String regex = positiveLookBehindForQuote +"," + positiveLookAheadForQuote;
return regex;
}
That is, split on comma, so long as the characters before and after were both double quotes.
However, this does not deal with pathological case where a value is itself the string ",". In which case, example data would look like:
"abc"",""ghi","pqr""stu""vwx"
And the fancy regex won't work for this.
Since the number of values in a row is constant (based on a header section), perhaps I could build up a regex for repeated rows, and use Matcher.find?
Any more elegant solutions?
Thanks, Neil

