Loading large text files into java vectors and outof memmory

Hi there,

need your help for the following:

i'm trying to load large ammoubnts of data into a Vector in order to concatenate several text files and treat them, but i'm getting outofmemory error. I even tried using xml structure and saving to database but the error is still the same. Can you help?

thanks

here's the code:

publicvoid Concatenate(){

try{

//for(int i=0;i<1;i++) {

vEntries =new Vector();

for(int i=0;i<BopFiles.length;i++){

MainPanel.WriteLog("reading file " + BopFiles[i] +"...");

FileInputStream fis =new FileInputStream(BopFiles[i]);

BufferedInputStream bis =new BufferedInputStream(fis);

DataInputStream in =new DataInputStream(bis);

String line = in.readLine();

Database db =new Database();

Connection conn = db.open();

while(line !=null){

DivideLine(BopFiles[i], line);

line = in.readLine();

}

FreeMemory(db, conn);

}

MainPanel.WriteLog("Num of elements: " + root.getChildNodes().getLength());

MainPanel.WriteLog("Done!");

}catch (Exception e){

e.printStackTrace();

}

}

publicvoid DivideLine(String file, String line){

if (line.toLowerCase().startsWith("00694")){

Header hd =new Header();

hd.headerFile = file;

hd.headerLine = line;

vHeaders.add(hd);

}elseif (line.toLowerCase().startsWith("10694")){

Line entry =new Line();

Vector vString =new Vector();

Vector vType =new Vector();

Vector vValue =new Vector();

entry.name = line.substring(45, 150).trim();

entry.number = line.substring(30, 45).trim();

entry.nif = line.substring(213, 222).trim();

entry.index=BopIndex;

entry.message=line;

entry.file=file;

String series = line.substring(252);

StringTokenizer st =new StringTokenizer(series,"A");

while (st.hasMoreTokens()){

String token=st.nextToken();

if(!token.startsWith(" ")){

vString.add(token);

vType.add(token.substring(2,4));

vValue.add(token.substring(4));

}

token=null;

}

entry.strings=new String[vString.size()];

vString.copyInto(entry.strings);

entry.types=new String[vType.size()];

vType.copyInto(entry.types);

entry.values=new String[vType.size()];

vValue.copyInto(entry.values);

vEntries.add(entry);

entry=null;

vString=null;

vType=null;

vValue=null;

st=null;

series=null;

line=null;

file=null;

MainPanel.SetCount(BopIndex);

BopIndex ++;

}

}

publicvoid FreeMemory(Database db, Connection conn){

try{

//db.update("CREATE TABLE entries (message VARCHAR(1000))");

db.update("DELETE FROM entries;");

PreparedStatement ps =null;

for(int i=0; i><vEntries.size(); i++ ){

Line entry = (Line) vEntries.get(i);

String value ="" + entry.message;

if(!value.equals("")){

try{

ps = conn.prepareStatement("INSERT INTO entries (message) VALUES('" + Tools.RemoveSingleQuote(value) +"');");

ps.execute();

}catch(Exception e){

e.printStackTrace();

System.out.println("error in number->" + i);

}

}

}

MainPanel.WriteLog("Releasing memory...");

vEntries =null;

vEntries =new Vector();

System.gc();

}catch (Exception e1){

e1.printStackTrace();

}

}

[6515 byte] By [javapnunesa] at [2007-11-26 14:43:58]
# 1
What's your goal here? To concatenate files? If that's the whole thing you could just use one filewriter / fileoutputstream and write the files as you load them
tjacobs01a at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 2

Well, i need to treat those contents, and calculate values withing those files, so wrinting files using FileInputstream wont do. for instance i need to get line 5 from file 1, split it, grab a value according to its class (value also taken) and compare it with another line of another file, adding those values to asingle file.

that's why i need vector capabilities, but since these files have more than 5 Mb each, an out of memory error is returned by loading those values into vector.

A better explanaition:

Each file has a line like

CLIENTNUM CLASS VALUE

so if the client is the same withing 2 files, i need to sum the lines into a single file.

If class is the same, then sum values, if not add it to the front.

we could have a final line like

CLIENTNUM CLASS1 VALUE1 CLASS2 VALUE2

javapnunesa at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 3
Try using -Xmx option to increase your heap size.
Rodney_McKaya at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 4
already tried that and now i'm able to open 3 files, but i need to be able to load at least five... :-)
javapnunesa at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 5
Then increase it some more...
Rodney_McKaya at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 6
How? placing value into that parameter? what about the desktops that do not have more than 250Mb RAM? It will work?
javapnunesa at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 7

-Xmx256M in the VM arguments.

Every OS has some kind of virtual memory mechanism (using the disk as memory).

You should also consider using a Map instead of a vector.

If what you're looking for is similar entries in the files you should keep a map with the entry as a key.

This way you will save allocating the memory for the entry name multiple times.

Rodney_McKaya at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 8
what do you mean by Map?Can you give me an example?
javapnunesa at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 9

Look at the API.

http://java.sun.com/j2se/1.5.0/docs/api/java/util/HashMap.html

Basically if you search for similar entries in different files you use the same map for all of them.

You read the entry, search in the map and if it doesn't exist you put the entry with the value.

If the entry exist you take the value and add the new value to it, and place it back in the map.

This way when you finish you have all your entries in the map with the values from all files.

Rodney_McKaya at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 10
java.lang.OutOfMemoryError at the same....is there any other func like HashMap that takes less memory?
javapnunesa at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 11
Are you sure you're using it correctly?Did you use -Xmx flag?
Rodney_McKaya at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 12
yes, i did...
javapnunesa at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 13
You are saving way too much information for each line.All the processing of the line can be done while you save the new file.One line at a time.No need to keep everything in memory.You keep all the line and also the information you parse from it.
Rodney_McKaya at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...
# 14
Still having problems with this....
javapnunesa at 2007-7-8 8:31:38 > top of Java-index,Desktop,Developing for the Desktop...