DTD vs XSD, now I've got a chicken egg/problem
Greetings,
A couple of years ago I implemented a scenario like this:
1) an xml comes in and specifies a DTD;
2) if the DTD text was already known, my DefaultHandler extending
class's resolveEntity() method would check if the DTD was known
already (a cache) and supply an InputSource to the SAXParser.
3) if the DTD wasn't known yet, the text would be loaded and added
to the cache; logically step 2) was tried again.
4) the rest of the xml content was validated against that particular DTD.
Now my customer wants to use XSDs which makes sense in some sort
of way. I can precompile XSDs easily and fill a cache with those for
future use; no problem with that. What I am unable to do is, given an
XSD directive (excusez le non xml-mot) while parsing an xml content
is set the Schema for the current SAXParser parsing the current xml.
I realize that I can modify an XMLParserFactory to feed a Schema to
a SAXParser to be created, but I don't know *which* of the Schema's
to feed to it, because that particular Schema is only specified in that
particular xml content. For now I consider it a chicken/egg problem
but I sure hope that someone more knowledgeable then I am can
supply me with a bit of information and help.
Thank you in advance for any help and,
kind regards,
Jos
[1416 byte] By [
JosAHa] at [2007-11-26 17:08:50]

# 1
A couple of solutions, if I understand the problem correctly.
First, look for a [url=http://www.w3.org/TR/xmlschema-0/#schemaLocation]schemaLocation[/url] tag(s) in your instance document, and use the location URL to cache the schema. As long as your instance documents try to be nice, this should work fine.
Along the same lines, there might be a way to intercept a schema-aware parser's own lookup -- something like the EntityResolver. I've always used a limited set of schemas, so don't have any pointers.
An ugly approach is to extract the [url=http://www.w3.org/TR/xmlschema-0/#UnqualLocals]targetNamespace[/url] from the schema, then look for it on your instance document. Aside from schemas that don't use namespaces (a bad practice, but prevalent), you'd have to parse it, find the namespaces, then reparse with validation turned on.
Edit: LSResourceResolver seems like it might be the cutpoint.
# 2
> A couple of solutions, if I understand the problem correctly.
Yes you did understand it correctly; thank you.
[ #1 ]
> First, look for a [url=http://www.w3.org/TR/xmlschema-0/#schemaLocation] schemaLocation[/url] tag(s) in your instance
> document, and use the location URL to cache the schema. As long
> as your instance documents try to be nice, this should work fine.
[ #2 ]
> Along the same lines, there might be a way to intercept a schema-
> aware parser's own lookup -- something like the EntityResolver. I've
> always used a limited set of schemas, so don't have any pointers.
[ #3 ]
> An ugly approach is to extract the[url=http://www.w3.org/TR/xmlschema-0/#UnqualLocals]targetNamespace[/url]
> from the schema, then look for it on your instance document. Aside
> from schemas that don't use namespaces (a bad practice, but
> prevalent), you'd have to parse it, find the namespaces, then
> reparse with validation turned on.
[ #4 ]
> Edit: LSResourceResolver seems like it might be the cutpoint.
[ #1 ] is what I'm doing now, but even that I consider a hack.
[ #2 ] I don't know how I should accomplish that because all I have is
a Reader input (stream) and I can't 'reparse' the entire thing.
[ #3 ] yuckie! ;-)
[ #4] I don't understand that but that's just me.
In general I find that 'decoupling' schemas from the parsing phase more
of a nuisance than a benefit if you don't know *in advance* what schema
to use, i.e. it is just specified in the incoming xml text itself *while* the
thing is being parsed already by a parser that is not aware of that schema.
Thanks a lot for your reply, I really appreciate it and gives me food for
thought about this thingie again.
Silly xml stuff ;-)
kind regards,
Jos
# 3
This might be an over-simplification and hack-ish, but if each XML request has its own URI (ala Rest), then don't you simply have to dynamically swap in a mapped schema at run-time? Or am I missing something? Is the problem you start parsing, validating the DTD, realize you have a schema and do not want to revalidate and reparse?
- Saish
# 4
> [ #4 ]> > Edit: LSResourceResolver seems like it might be the cutpoint.> [ #4] I don't understand that but that's just me. http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSResourceResolver.html
# 5
> This might be an over-simplification and hack-ish, but if each XML
> request has its own URI (ala Rest), then don't you simply have to
> dynamically swap in a mapped schema at run-time? Or am I missing
> something?
No, I'm afraid I'm missing something (simple?). As far as I could figure
out from the docs I *first* have to tell the SAXParseFactory to include a
compiled Schema and *then* I can create such a validating parser.
AFAICT, it's too late if I just instantiate such a (non-schema) validating
parser and let it parse some incoming xml stream, I don't know how
to tell the parser on the fly that it *does* have to validate against some
Schema. I think I got lost in the documentation again.
thanks for your reply and,
kind regards,
Jos
# 6
> > [ #4 ]
> > > Edit: LSResourceResolver seems like it might be the cutpoint.
>
> > [ #4] I don't understand that but that's just me.
>
> http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSResourceResolver.html
Thank you for that link; I just read it but it confused me more for now.
I'm afraid I'm missing something simple because I got lost in the docs
again. All I want/need is a way to set a Schema to the current parser
while that parser is actually parsing the first part of a bit of xml.
hrmph, more reading to do.
kind regards,
Jos