Which algorithm to use

Hi

I have to write XML parser which would parse some movie script and then I will have to display that and provide search, add, delete and as well as that I will have to be able to organize their replics by actors and that kind of stuff. The parsing bit I know how to do. But I am not sure what would be the best solution for data structure. I am thinking about using B-Trees, just would like to know. What you can suggest me. Is their any better algorithm to do this job? Thank's a lot!!

[502 byte] By [mehtiboy] at [2007-9-30 21:04:44]
# 1
Have you looked into the standard XML parsers and XSLT?
YATArchivist at 2007-7-7 2:36:34 > top of Java-index,Other Topics,Algorithms...
# 2
You'll be using a DOM tree, from the sound of it. The hard part will be designing an XML schema that will capture your problem. Once you have that, the parsers and DOM tree are commodity, off-the-shelf items.%
duffymo at 2007-7-7 2:36:34 > top of Java-index,Other Topics,Algorithms...
# 3
A java near solution would be jdom from jdom.org. With xpath to access thing from the DOM you would be sufficiently flexible, and not burdened with overhead.
joop_eggen at 2007-7-7 2:36:34 > top of Java-index,Other Topics,Algorithms...
# 4

JDOM is based on the same standards as Xerces, Crimson, etc. It's still a DOM tree. They just don't use the W3C classes, that's all.

The point is that if you use any off-the-shelf Java XML parser you're either going to use a DOM parser, which will generate the tree data structure for you, OR you'll use the SAX parser and write your own event handlers to build the data structure for you.

You're asking if there's any data structure that's proven to be better in some way than the DOM tree. I don't have any proof one way or the other.

As Joop points out, the advantage of using the DOM tree is that it's already designed to use XPATH and all that stuff. I think you lose that if you go with your own data structure.

I think you're safe with DOM.

%

duffymo at 2007-7-7 2:36:34 > top of Java-index,Other Topics,Algorithms...
# 5

I am using JDOM to load XML documents and then rather than convert the JDOM document to my own object tree I use the 'facade' pattern. I prime a facade with a JDOM element (usually obtained from XPath) and then have methods on the facade to access and edit 'fields' .

This works well and decouples (well nearly) the data layer from the presentation layer.

sabre150 at 2007-7-7 2:36:34 > top of Java-index,Other Topics,Algorithms...
# 6

I don't think any of the above solutions will be sufficient for quick searching if you have alot of data. There are a few problems with DOM:

* Memory cost - about 6 times the original xml-document size. With xml-documents of 100 MB the DOM structure will swallow 600 MB

* Search efficiency. Search is linear. For smaller documents it's ok I guess, but for large documents you need patient users.

BTree will work. However, the problem is to integrate the BTree into your project. It will take some effort.

If you don't have too much data. A hashtable or a binary tree (will allow searches like 'condition*') in memory will do just fine a provide lightning fast searches.

Gil

gilroitto at 2007-7-7 2:36:34 > top of Java-index,Other Topics,Algorithms...
# 7
Or maybee XPath searches is not linear in search. I don't know actually, it was just my assumption of earlier experiences with DOM.Gil
gilroitto at 2007-7-7 2:36:34 > top of Java-index,Other Topics,Algorithms...
# 8
Thank's a lot. I will look into facade design pattern and see how I can apply it to my problem!
mehtiboy at 2007-7-7 2:36:34 > top of Java-index,Other Topics,Algorithms...
# 9

> I don't think any of the above solutions will be

> sufficient for quick searching if you have alot of

> data. There are a few problems with DOM:

> * Memory cost - about 6 times the original

> xml-document size. With xml-documents of 100 MB the

> DOM structure will swallow 600 MB

> * Search efficiency. Search is linear. For smaller

> documents it's ok I guess, but for large documents you

> need patient users.

>

> BTree will work. However, the problem is to integrate

> the BTree into your project. It will take some effort.

>

> If you don't have too much data. A hashtable or a

> binary tree (will allow searches like 'condition*') in

> memory will do just fine a provide lightning fast

> searches.

>

> Gil

Thank's for you idea's. I think I will go with hashtable and araylist to store the order of replics. I gave up idea of BTree cause I don't have enough time to implement it.

mehtiboy at 2007-7-7 2:36:34 > top of Java-index,Other Topics,Algorithms...