a solution to parsing CSV records that contain embedded commas in quotes

I've seen so many people looking to use StringTokenizer or Split to parse a quoted CSV file, only to fail to handle embedded ','s in quoted fields. So i'm throwing in my 2-cents with my solution that anyone can use freely.

It's small, comes with a little test program and utilizes the String.split function.

Enjoy,

Todd Blackley

tblackley@att.net

www.pawsource.org

/*

Copyright (c) 2005 Todd Blackley

The above copyright notice shall be included in all copies or substantial

portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS

FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR

COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER

IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN

CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

*/

package com.pawsource.model;

import java.util.*;

public class Csv

{

publicCsv()

{}

publicvoidtest1()

{

LinkedListlist= new LinkedList();

for (int i=0;i<1000000;i++)

list.add(new String(Integer.toString(i)));

for (Iterator iter = list.iterator(); iter.hasNext();)

{

Strings = (String)iter.next();

System.out.println(s);

}

}

publicStringtoCSV(ArrayList aList)

{

StringBufferbuffer= new StringBuffer();

for (int i=0;i<aList.size();i++)

{

Stringtext = (String)aList.get(i);

booleanneedQuotes= text.indexOf(",")==-1?false:true;

if (needQuotes)buffer.append("\"");

buffer.append(text);

if (needQuotes)buffer.append("\"");

if (i><aList.size()-1)

buffer.append(",");

}

return buffer.toString();

}

publicArrayListparseCSV(String aBuffer)

{

String[]orig= aBuffer.split(",");

ArrayListtokens= new ArrayList();

intindex= 0;

while (index><orig.length)

{

if (orig[index].startsWith("\""))

{

intorigIndex= index;

tokens.add(orig[index].substring(1,orig[index].length()));

while (orig[index].endsWith("\"")==false && ++index><orig.length)

{

tokens.set(origIndex,tokens.get(origIndex)+ "," + orig[index]);

}

if (index><orig.length)

{

Stringtemp = (String)tokens.get(origIndex);

tokens.set(origIndex,temp.substring(0,temp.length()-1));

}

}

else

tokens.add(orig[index]);

index++;

}

return tokens;

}

public static void main(String[] args)

{

Csvtest = new Csv();

System.out.println("Start...");

Stringbuffer= "1,2,,4,\"5,6,7,8\",9,10";

ArrayListtokens = test.parseCSV(buffer);

for (int i=0;i<tokens.size();i++)

{

System.out.println("Token :" + tokens.get(i));

}

System.out.println("CVS :" + test.toCSV(tokens));

System.out.println("Finish...");

}

}>

[3187 byte] By [todd102a] at [2007-10-2 5:37:45]
# 1

My $0.2 worth for cleaning up the code (nothing to do with functionality and not trying to be critical):

list.add(new String(Integer.toString(i)));

shoud be

list.add(Integer.toString(i));

no reason to create an extra String since that is what toString() returns

boolean needQuotes = text.indexOf(",")==-1?false:true;

should be

boolean needQuotes = text.indexOf(",")==-1;

and

while (orig[index].endsWith("\"")==false && ++index><orig.length)

should be

while (orig[index].endsWith("\"") && ++index><orig.length)

the rest is redundant

Also, what is "><"?

Both toCSV() and parseCSV should be static because they contain no state.

jbisha at 2007-7-16 1:48:13 > top of Java-index,Java Essentials,Java Programming...
# 2

Use the "code" tags when posting code so it retains its original formatting.

What about handling quotes embedded in a token?

String buffer = "1,2,,4,\"5,6\"\"7,8\",9,10";

> Also, what is "><"?

I had to replace the above with "!=" to get it to compile with JDK1.4.2.

camickra at 2007-7-16 1:48:13 > top of Java-index,Java Essentials,Java Programming...
# 3
yeah, Csv projects may get really complex. I found a library a few months ago that helped me as well. It is called "CSV Manager" or so. Just google it if you are interested.
lgarcia3a at 2007-7-16 1:48:13 > top of Java-index,Java Essentials,Java Programming...