DTD vs XSD, now I've got a chicken egg/problem

Greetings,

A couple of years ago I implemented a scenario like this:

1) an xml comes in and specifies a DTD;

2) if the DTD text was already known, my DefaultHandler extending

class's resolveEntity() method would check if the DTD was known

already (a cache) and supply an InputSource to the SAXParser.

3) if the DTD wasn't known yet, the text would be loaded and added

to the cache; logically step 2) was tried again.

4) the rest of the xml content was validated against that particular DTD.

Now my customer wants to use XSDs which makes sense in some sort

of way. I can precompile XSDs easily and fill a cache with those for

future use; no problem with that. What I am unable to do is, given an

XSD directive (excusez le non xml-mot) while parsing an xml content

is set the Schema for the current SAXParser parsing the current xml.

I realize that I can modify an XMLParserFactory to feed a Schema to

a SAXParser to be created, but I don't know *which* of the Schema's

to feed to it, because that particular Schema is only specified in that

particular xml content. For now I consider it a chicken/egg problem

but I sure hope that someone more knowledgeable then I am can

supply me with a bit of information and help.

Thank you in advance for any help and,

kind regards,

Jos

[1416 byte] By [JosAHa] at [2007-11-26 17:08:50]
# 1

A couple of solutions, if I understand the problem correctly.

First, look for a [url=http://www.w3.org/TR/xmlschema-0/#schemaLocation]schemaLocation[/url] tag(s) in your instance document, and use the location URL to cache the schema. As long as your instance documents try to be nice, this should work fine.

Along the same lines, there might be a way to intercept a schema-aware parser's own lookup -- something like the EntityResolver. I've always used a limited set of schemas, so don't have any pointers.

An ugly approach is to extract the [url=http://www.w3.org/TR/xmlschema-0/#UnqualLocals]targetNamespace[/url] from the schema, then look for it on your instance document. Aside from schemas that don't use namespaces (a bad practice, but prevalent), you'd have to parse it, find the namespaces, then reparse with validation turned on.

Edit: LSResourceResolver seems like it might be the cutpoint.

kdgregorya at 2007-7-8 23:36:42 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2

> A couple of solutions, if I understand the problem correctly.

Yes you did understand it correctly; thank you.

[ #1 ]

> First, look for a [url=http://www.w3.org/TR/xmlschema-0/#schemaLocation] schemaLocation[/url] tag(s) in your instance

> document, and use the location URL to cache the schema. As long

> as your instance documents try to be nice, this should work fine.

[ #2 ]

> Along the same lines, there might be a way to intercept a schema-

> aware parser's own lookup -- something like the EntityResolver. I've

> always used a limited set of schemas, so don't have any pointers.

[ #3 ]

> An ugly approach is to extract the[url=http://www.w3.org/TR/xmlschema-0/#UnqualLocals]targetNamespace[/url]

> from the schema, then look for it on your instance document. Aside

> from schemas that don't use namespaces (a bad practice, but

> prevalent), you'd have to parse it, find the namespaces, then

> reparse with validation turned on.

[ #4 ]

> Edit: LSResourceResolver seems like it might be the cutpoint.

[ #1 ] is what I'm doing now, but even that I consider a hack.

[ #2 ] I don't know how I should accomplish that because all I have is

a Reader input (stream) and I can't 'reparse' the entire thing.

[ #3 ] yuckie! ;-)

[ #4] I don't understand that but that's just me.

In general I find that 'decoupling' schemas from the parsing phase more

of a nuisance than a benefit if you don't know *in advance* what schema

to use, i.e. it is just specified in the incoming xml text itself *while* the

thing is being parsed already by a parser that is not aware of that schema.

Thanks a lot for your reply, I really appreciate it and gives me food for

thought about this thingie again.

Silly xml stuff ;-)

kind regards,

Jos

JosAHa at 2007-7-8 23:36:42 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 3

This might be an over-simplification and hack-ish, but if each XML request has its own URI (ala Rest), then don't you simply have to dynamically swap in a mapped schema at run-time? Or am I missing something? Is the problem you start parsing, validating the DTD, realize you have a schema and do not want to revalidate and reparse?

- Saish

Saisha at 2007-7-8 23:36:42 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 4
> [ #4 ]> > Edit: LSResourceResolver seems like it might be the cutpoint.> [ #4] I don't understand that but that's just me. http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSResourceResolver.html
kdgregorya at 2007-7-8 23:36:42 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 5

> This might be an over-simplification and hack-ish, but if each XML

> request has its own URI (ala Rest), then don't you simply have to

> dynamically swap in a mapped schema at run-time? Or am I missing

> something?

No, I'm afraid I'm missing something (simple?). As far as I could figure

out from the docs I *first* have to tell the SAXParseFactory to include a

compiled Schema and *then* I can create such a validating parser.

AFAICT, it's too late if I just instantiate such a (non-schema) validating

parser and let it parse some incoming xml stream, I don't know how

to tell the parser on the fly that it *does* have to validate against some

Schema. I think I got lost in the documentation again.

thanks for your reply and,

kind regards,

Jos

JosAHa at 2007-7-8 23:36:42 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 6

> > [ #4 ]

> > > Edit: LSResourceResolver seems like it might be the cutpoint.

>

> > [ #4] I don't understand that but that's just me.

>

> http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSResourceResolver.html

Thank you for that link; I just read it but it confused me more for now.

I'm afraid I'm missing something simple because I got lost in the docs

again. All I want/need is a way to set a Schema to the current parser

while that parser is actually parsing the first part of a bit of xml.

hrmph, more reading to do.

kind regards,

Jos

JosAHa at 2007-7-8 23:36:42 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...