How to manage large quantities of XML data efficiency in a web app
Hello,
I am planning the development of a web application/web service which asks users questions according to a script represented in XML. There is likely going to be many hundreds of such XML scripts.
I would like some advice as to the best way to handle this XML data, particularly in terms of whether I should store this information in a database, or whether I am better off treating them as separate files much like text or HTML files.
I would like to know the sort of things that should influence a developer's decision, and what sort of solutions and database-related technologies there are to support this XML storage problem.
Thank you very much for any advice.
Greg
[714 byte] By [
GregScotta] at [2007-11-27 5:42:38]

# 1
If you have a collection of documents, and you have people who are editing those documents and creating new ones, then you might need a Content Management System to manage the process. You could of course write your own management system, but it would still be worthwhile to to have a look at existing CMSes to see which of their ideas you could use.
# 2
Thank you for this, but it doesn't answer my original question. I would like to understand better the issues behind storing XML content. When would it be better to use a database? When should I just store them in files? If it is useful, for my application, the XML files will be read much more often than they are written/edited. I would also like to be able to search through the data.
# 3
Well, the issues are those that are addressed by content management systems in various ways. The fact that the data happens to be XML is (I think) not all that important. The main issue is that you have lumps of text that need to be stored and retrieved.
For one project I did, I decided to store data in the database rather than in files because it was going to be accessed from several application servers, and it was easier to make them all access the same database than it was to make them all access the same file system.
That's the sort of question that arises. But it's nothing to do with XML per se. The point I'm trying to make here is that it's a general content management decision you are trying to make, so you need to read up on content management in general. I'm no expert in that field but that's what I would do if I had to answer the question.
# 4
You probably don抰 need a full blown CMS solution for XML document management but you may want to consider an XML DB. Storing your documents as flat files may work while your app is simple, but businesses change and software that was supposed to be 搒imple?has a way of becoming complex. A database may give your software more flexibility (and thus increase its survivability). Generally, the approach you should take will depend on how you use your data.
If your data use is document-centric (meaning you only work with whole documents and your queries / updates don抰 span documents), then you may be able to get away with a flat file format. Keep in mind that you抣l also have to deal with partial reads / writes (consider an unexpected power outage or failure) that a DB could manage for you. Relational DBs with XQuery support layered on top tend to perform better for document-centric queries vs native XML DBs.
If your use is data 朿entric (meaning your queries / updates span multiple documents), then I would recommend a native XML DB. The XQuery support offered by these DBs will greatly simplify your code as they take care of the queries and updates for you. In addition, these DBs tend to improve performance over their relational counterparts (and definitely over flat files) because of the way they index and store the XML documents.Try to avoid DBs that layer XQuery on top of a relational DB in this case. These DBs must shred the documents into tables and the performance will not be as good as a true native XML DB.
I hope this helps,
Alex