Is Xindice not the way to go?

I would like to know whether native XML databases are still something that are being developed? I ask because I am struggling to find very up-to-date references to these technologies, and I didn't want to embark on using a system like Xindice if it isn't going to be developed/supported in the future.

What is the status of XML databases?

Thank you for any insight.

Greg

[397 byte] By [GregScotta] at [2007-11-27 5:56:23]
# 1

People have been working on XML databases for quite a few years now. But like object-oriented databases, they have an uphill battle against the SQL empire. Probably they will always be niche products. I wouldn't say don't use them, just continue applying the due diligence you're already doing before you do that.

DrClapa at 2007-7-12 16:26:59 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2

Hi Greg,

I believe I may have answered some or your question in a previous post. I抳e been working with numerous XML DBs for almost 6 years now and I抳e seen them grow and evolve over time. They are much better now than ever before but there are pros and cons to consider when weighing Native XML DBs over their relational counterparts. It抯 really about choosing the right tool for the right job. My experience has shown me the following when working with XML:

Native XML DBs

-Are designed specifically to handle XML data.

-Perform faster than RDBMS for data-centric queries.

-Offer more flexibility than RDBMS. RDBMS are great at relation mapping. However, as anyone that抯 ever worked on a complex project can attest, modifying the data model in a RDBMS with numerous relations can be a very expensive and time consuming proposition ?often taking months and several data architects. However with XMLDB, if you want to change your data or add new document types, you simply add them. The structure is less rigid.

Relational DBs

-Are designed to handle data relations and models that fit well into rows and columns. XML is a tree-like structure and tends not to fit this model well.

-Many RDBMS vendors (e.g. Oracle) have added XQuery support for dealing with XML. These DBs tend to perform faster for document-centric queries but will be slower for data-centric queries.

-Often require schemas to map the XML to rows and columns. This can eliminate the flexibility that a native XMLDB offers.

-Have been around since the ?0s. The relational model is well understood and tools are more mature.

When people ask, I tend to boil it down like this:

XML is a verbose language but offers flexibility that makes it a good choice in many situations. Tabular data offers great performance and the relational model has proven itself for over 40 years. However, when you blend the two (XML to RDBMS), you tend to lose the flexibility, gain the verbosity, and in short get the drawbacks of both and the benefits of neither.

If you do decide to go with XMLDB, I抳e worked with Xindice,Tamino, IPedo, Raining Data, and Berkeley DBXML and would be happy to answer questions you may have about these products.

Super_Squirrela at 2007-7-12 16:26:59 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 3

Thank you very much for this, if only all answers were as rich and full of insight.

May I ask, then, what there is to choose between those XML database you've listed? I would like something free, something Java, and something open source ideally. I have just started playing with Xindice but then it struck me that this product might be for some reason falling out of vogue with developers or it might have been superceded by another similar project.

GregScotta at 2007-7-12 16:26:59 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 4

Unfortunately, not many are open source or free. It turns out that good indexing algorithms for XML content are not that easy to develop and it often takes an entire company to do so (and they like to get paid). : )

As you noted, Xindice is free but it is not getting that much attention these days. It also is not a very strong XML DB in my opinion. It works great as a learning tool or for very small projects but falls short when it comes to meeting production requirements.

We currently use Berkeley DB XML for our projects. It is a great open source database with a Java layer built on top of the Berkeley core (which is quite stable and has served as the core for MySQL and OpenLDAP if I remember correctly). It is also free but before you jump up and down for joy, it is an embedded database which may be different than you are used to. In this model you don't interact with a central DB server. Instead, you program to their API and the database is embedded in your application or server. The plus side is that performance is excellent and they follow current XQuery standards well. In addition, their API isn't too difficult and they've got a lot of good documentation. The down side is that you don't get a nice DB admin GUI out of the box. There may be a 3rd party solution that has integrated with it but the command line is fine for our needs.

If you need a central DB server, I would recommend Raining Data. They had great performance and scalability though they are not java based (but have a Java API).A fair bit of their code is written in assembly which means they will take advantage of the extra registers in a 64bit CPU and they cache a lot so be sure you've got plenty of RAM available for the best performance. Last I spoke with them, they didn't have XML indices for you to manage (they index everything) which is nice because indices can occasionally get corrupted and then you have to rebuild them. Plus, not worrying about XML indices will make you life easier. If you have a lot of documents and your query doesn't hit an index, then the DB has to inspect every document which could (and often does) take a very very long time.

Ipedo is a java based DB server that we've used for around 3 years. It worked well until it got over 250,000 records in a single collection. They say they handle millions in a single collection but in practice that wasn't true. It went down quite often in our production environments and because of that we switched to Berkeley DB XML. I'm sure you could model your data such that your collections don't grow as large as ours did and thus avoid that problem.

Finally, that leaves Tamino. They were not so great (to say the least) and their support staff was less than friendly so we abandoned them as quickly as we could. Granted that was four years ago and they may have improved things.

I hope this helps,

Alex

Super_Squirrela at 2007-7-12 16:26:59 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 5
XIndice is an embedded database and has advantages over the relational databases for storing XML. http://www.onjava.com/pub/a/onjava/2006/03/08/storing-xml-document-with-apache-xindice.html
dvohra09a at 2007-7-12 16:26:59 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...