HTMLDocument problem

I am using the following code for reading and searching for a String searchText within the HTMLDocument.

privatevoid search(){

file=new FileReader("path_of_file");

htmlEditorKit=new HTMLEditorKit();

document = (HTMLDocument) htmlEditorKit.createDefaultDocument();

callback = document.getReader(0);

parser=new ParserDelegator();

parser.parse(file, callback,true);

docText=document.getText(document.getStartPosition().getOffset(), document.getEndPosition().getOffset());

System.out.println("START\n" + docText +"\nEND\n");

while( (pos=docText.indexOf(searchText, pos+1))!=-1)

{

//store_pos_in_arraylist

}

file.close();

}

The code is working quite well except for the fact that it's not reading the entire HTML Document!!!

The call to System.out.println("START\n" + docText +"\nEND\n");

is not returning the complete file details...it's only part of the file...

I have tried many variations, but I am just not able to figure out why this is happening....is there some buffer thing or something associated with this?

Where exactly is the problem, is it within the reading function call or the callReader function call or whether it is something to do with my string docText....Can anyone help me a bit in understanding where I am going wrong and how I may rectify the problem...?

Thanks & Regards.

Dhruva Sagar

[1948 byte] By [DhruvaSagara] at [2007-11-26 15:22:15]
# 1
What does the document look like, before and after it's read?
glevnera at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 2

Physically the HTML document is perfectly ok.

I might be using some styles within the document, that won't be effecting this would it?

And what exactly do you mean by that? How do I check what the documents looks like before and after and before and after what?

As in my println statement it's not getting the full document. If the document it's reading is a small document with not so much content it's reading it full, but if the html file is a very long document with lots of content then it's not reading the entire document, it;s returning only a part of the file data in docText....

Why should this be?

DhruvaSagara at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 3

I mean, what does the original document contain, and what does your HTMLDocument load and/or skip? If you are getting only the beginning of a long document, then perhaps the document is being loaded asynchronously. In that case, there is presumably an event generated when the document finishes loading and which you can listen for.

glevnera at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 4
Will using the function setAsynchronousLoadPriroty help here in my case? Only part of the document is loaded into the docText...
DhruvaSagara at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 5

It sounds to me like you can either call that method to ensure that the document is loaded synchronously, or adapt your code so that it does not assume the document has been fully loaded until the corresponding event has been received. The second option sounds better to me because it does not block the event dispatching thread when a long document is loaded.

glevnera at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 6

Can you please throw some more light on how I may acheive either of your suggestions?

How should I ensure that the document is loaded synchronously or how do i ensure that the call to read it should only be made once the document has been loaded completely?

Please help me understand how it may be done...

DhruvaSagara at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 7
To ensure the document is loaded synchronously, I imagine you would do something like this:document = (HTMLDocument) htmlEditorKit.createDefaultDocument();document.setAsynchronousLoadPriority(-1);But I have never done this. Try it out and let us know!
glevnera at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 8
No, using that as well I am unable to achieve the thing, it is still not reading the entire document, infact it's reading till a particular word, everytime till the same word.Small documents are read entirely, but large one's are not....please help me try the other way round....
DhruvaSagara at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 9
Have you tried using HTMLEditorKit.read() to load the document? I notice from the source code that it ends with a call to flush the reader, which your code does not do.
glevnera at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 10

Well yes I have tried the htmlEditorKit.read method which though seemed like a better option at the time I tried it, it is giving a run-time exception which I just couldn't get around even by searching of ways for it.

Here is the stack trace of it for you just in case your interested or you may know how to get past this.

javax.swing.text.ChangedCharSetException

at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(Unknown Source)

at javax.swing.text.html.parser.Parser.startTag(Unknown Source)

at javax.swing.text.html.parser.Parser.parseTag(Unknown Source)

at javax.swing.text.html.parser.Parser.parseContent(Unknown Source)

at javax.swing.text.html.parser.Parser.parse(Unknown Source)

at javax.swing.text.html.parser.DocumentParser.parse(Unknown Source)

at javax.swing.text.html.parser.ParserDelegator.parse(Unknown Source)

at javax.swing.text.html.HTMLEditorKit.read(Unknown Source)

at source.HashCalculator.search(HashCalculator.java:1013)

at source.HashCalculator.access$26(HashCalculator.java:992)

at source.HashCalculator$16.actionPerformed(HashCalculator.java:916)

at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)

at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)

at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)

at javax.swing.DefaultButtonModel.setPressed(Unknown Source)

at javax.swing.AbstractButton.doClick(Unknown Source)

at javax.swing.plaf.basic.BasicRootPaneUI$Actions.actionPerformed(Unknown Source)

at javax.swing.SwingUtilities.notifyAction(Unknown Source)

at javax.swing.JComponent.processKeyBinding(Unknown Source)

at javax.swing.KeyboardManager.fireBinding(Unknown Source)

at javax.swing.KeyboardManager.fireKeyboardAction(Unknown Source)

at javax.swing.JComponent.processKeyBindingsForAllComponents(Unknown Source)

at javax.swing.JComponent.processKeyBindings(Unknown Source)

at javax.swing.JComponent.processKeyEvent(Unknown Source)

at java.awt.Component.processEvent(Unknown Source)

at java.awt.Container.processEvent(Unknown Source)

at java.awt.Component.dispatchEventImpl(Unknown Source)

at java.awt.Container.dispatchEventImpl(Unknown Source)

at java.awt.Component.dispatchEvent(Unknown Source)

at java.awt.KeyboardFocusManager.redispatchEvent(Unknown Source)

at java.awt.DefaultKeyboardFocusManager.dispatchKeyEvent(Unknown Source)

at java.awt.DefaultKeyboardFocusManager.preDispatchKeyEvent(Unknown Source)

at java.awt.DefaultKeyboardFocusManager.typeAheadAssertions(Unknown Source)

at java.awt.DefaultKeyboardFocusManager.dispatchEvent(Unknown Source)

at java.awt.Component.dispatchEventImpl(Unknown Source)

at java.awt.Container.dispatchEventImpl(Unknown Source)

at java.awt.Window.dispatchEventImpl(Unknown Source)

at java.awt.Component.dispatchEvent(Unknown Source)

at java.awt.EventQueue.dispatchEvent(Unknown Source)

at java.awt.EventDispatchThread.pumpOneEventForHierarchy(Unknown Source)

at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)

at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)

at java.awt.Dialog$1.run(Unknown Source)

at java.awt.Dialog$2.run(Unknown Source)

at java.security.AccessController.doPrivileged(Native Method)

at java.awt.Dialog.show(Unknown Source)

at java.awt.Component.show(Unknown Source)

at java.awt.Component.setVisible(Unknown Source)

at source.HashCalculator.getHelpContents(HashCalculator.java:989)

at source.HashCalculator.access$38(HashCalculator.java:625)

at source.HashCalculator$64.actionPerformed(HashCalculator.java:1894)

at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)

at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)

at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)

at javax.swing.DefaultButtonModel.setPressed(Unknown Source)

at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)

at java.awt.AWTEventMulticaster.mouseReleased(Unknown Source)

at java.awt.Component.processMouseEvent(Unknown Source)

at javax.swing.JComponent.processMouseEvent(Unknown Source)

at java.awt.Component.processEvent(Unknown Source)

at java.awt.Container.processEvent(Unknown Source)

at java.awt.Component.dispatchEventImpl(Unknown Source)

at java.awt.Container.dispatchEventImpl(Unknown Source)

at java.awt.Component.dispatchEvent(Unknown Source)

at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)

at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)

at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)

at java.awt.Container.dispatchEventImpl(Unknown Source)

at java.awt.Window.dispatchEventImpl(Unknown Source)

at java.awt.Component.dispatchEvent(Unknown Source)

at java.awt.EventQueue.dispatchEvent(Unknown Source)

at java.awt.EventDispatchThread.pumpOneEventForHierarchy(Unknown Source)

at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)

at java.awt.EventDispatchThread.pumpEvents(Unknown Source)

at java.awt.EventDispatchThread.pumpEvents(Unknown Source)

at java.awt.EventDispatchThread.run(Unknown Source)

Also I tried using callback.flush() method as you suggested, but again it's not working. I am trying everything I can in the given context, everythign that even remotely sounds something similar in logic to what I want to achieve :).

Thanks a lot for your patience in all this, I truely appreciate it.

Thanks and regards.

Dhruva Sagar.

DhruvaSagara at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 11

You can get around that exception in one of two ways. One is to edit your HTML file and remove the charset declaration (it probably contains a META tag with a charset attribute). The other is to do this in your code:

document.putProperty("IgnoreCharsetDirective", Boolean.TRUE);

Geoff

glevnera at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 12
When i searched for ignoreCharacterSet in google, that was the thing I found, the same code you gave me...So i tried it. but it doesn't seem to work.But the other thing that you mentioned is pretty interesting...that I will certainly try...
DhruvaSagara at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 13
It works for me.Perhaps there is something wrong with your particular HTML document. Have you tried other, equally long documents? Maybe if you post your HTML document somebody will see the problem.
glevnera at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 14
Man I was using this...document.putProperty("IgnoreCharacterSet", Boolean.TRUE );After changing it to what you said, it's working!!!!Now it's reading the entire document!!!Thanks a million :)).
DhruvaSagara at 2007-7-8 21:37:17 > top of Java-index,Desktop,Core GUI APIs...
# 15
You're welcome.
glevnera at 2007-7-21 16:25:13 > top of Java-index,Desktop,Core GUI APIs...