Can I force a parser to use fixed DTD for validation?

How do I force a SAX parser to use a fixed DTD (and ignore any DOCTYPE decl. in the xml doc) ?

I have been searching Java forums, apache website, the web, and SAX APIs for a way to do this but without luck. The usage is that I want to validate with a DTD located on the server, and wish to set my parser to always use that DTD, rather than requiring the user to declare a DOCTYPE declaration in the xml doc. The only solution I've seen so far is to actually search for a DOCTYPE decl in the xml, and replace it if it's there, with our own. This seems like a "hack" to me...

I've played around with EntityResolver and peeked at DTDHandler but those don't seem to work or be the right approach.

Any help would be appreciated!

Thanks

Stephen Boniface

[795 byte] By [1884133] at [2007-9-26 2:16:16]
# 1

Stephen,

I saw your other posting, and here's some code that at least works. It shows you the referenced external entities. I ran this with Xerces 1.4 versions, 2.0.0_alpha has problems with it.

Here's the test program:

import javax.xml.parsers.DocumentBuilder;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;

import org.xml.sax.EntityResolver;

import org.xml.sax.InputSource;

import org.xml.sax.SAXException;

public class JAXPDOMTest1 implements org.xml.sax.EntityResolver{

public void domParse(String url)

{

DocumentBuilder parser;

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

factory.setValidating(true);

try {

parser = factory.newDocumentBuilder();

parser.setEntityResolver(this);

Document doc = parser.parse(url);

} catch (Exception e) {

e.printStackTrace();

}

}

public InputSource resolveEntity(java.lang.String publicId,java.lang.String systemId) throws SAXException,java.io.IOException

{

InputSource is=null;

System.out.println("--");

System.out.println("publicId="+publicId);

System.out.println("systemId="+systemId);

System.out.println("--");

// comment the next 2 lines, if you want to see the DTD specified in the XML file.

if(systemId != null && systemId.endsWithIgnoreCase("dtd"))

is = new InputSource("replaced.dtd");

return is;

}

static public void main(String[] args)

{

JAXPDOMTest1 x=new JAXPDOMTest1();

x.domParse("test.xml");

}

}

here's test.xml:

<?xml version="1.0"?>

<!DOCTYPE GREETING SYSTEM "test.dtd">

<GREETING>

&junk_txt;

</GREETING>

and here's test.dtd (you need a junk.txt file):

<!ELEMENT GREETING (#PCDATA)>

<!ENTITY junk_txt SYSTEM "junk.txt">

and here's replace.dtd (you need a xyz.txt file):

<!ELEMENT GREETING (#PCDATA)>

<!ENTITY junk_txt SYSTEM "xyz.txt">

However, the problem is that in resovleEntity() you only get the systemID or publicID, respectively. That's why I included an external entity in the DTD examples. So, you simply cannot assume that the first external entity is the DTD (suppose you receive a file which contains an internal DTD and has an external reference (like junk_txt above). You will replace that entity instead of the DTD). The example above relies on the fact that files end with "dtd", that might not be the case either. And the DTD can be part of the file you receive, which makes the resolveEntity() solution useless.

Protecting against "malicious" DTDs is tough.

So, you can go with the "hack" solution, go with the EntityResolver solution (if it's good enough for you), or just do it right and move over to XML Schema, which solves this - there you can override the Schema location specified in the XML file by your own.

For further comments you can take a look at the XML Bible - look for "Validating the document against the schema" some 200 lines down on this page:

http://www.ibiblio.org/xml/books/bible2/chapters/ch24.html

Hope that helps a little,

Good luck.

lk555 at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2

Thanks for the tip...

I had seen something similar to this elsewhere, but unfortunately, it still requires the user to place a DTD in an ENTITY declaration in the xml. I am trying to avoid the user from having to specify any DTD anywhere in the xml, and forcing the parser to use my DTD.

My solution was to parse the xml, replace any existing DOCTYPE with my own, or insert my DOCTYPE if there was none specified.

Does this make sense?

Thanks

Stephen

1884133 at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 3

HEEEEEEEEEEELP !

I still have the same problem

(-concerning the first post on this topic-).

I still don't find a way to do this...

This is a very simple problematic though !

I am sure this IS possible indeed...

Any Help would be extremely appreciated....

javalova at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 4
Can you use a W3C schema instead of a DTD. So many people are wasting so much time clinging to DTD when it is completely inferior to W3C schema. You can do what you are looking for with a schema in about 5 lines of code.
dubwai at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 5

All right... so what are these 5 lines ? :oP

I followed your advice and written a correct schema file (.xsd)

describing the syntax of my xml file.

is there some simple sample code for validating the xml file

with my xsd in java? [i'm still roaming the internet looking for docs...]

javalova at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 6

So I lied. 8 lines.

private static final String JAXP_SCHEMA_LANGUAGE =

"http://java.sun.com/xml/jaxp/properties/schemaLanguage";

private static final String W3C_XML_SCHEMA =

"http://www.w3.org/2001/XMLSchema";

private static final String JAXP_SCHEMA_SOURCE =

"http://java.sun.com/xml/jaxp/properties/schemaSource";

{

SAXParserFactory factory = SAXParserFactory.newInstance();

factory.setNamespaceAware(true);

factory.setValidating(true);

SAXParser parser = factory.newSAXParser();

parser.setProperty(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);

parser.setProperty(JAXP_SCHEMA_SOURCE, schema);

parser.parse(InputStream, Handler);

}

dubwai at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 7

> parser.setProperty(JAXP_SCHEMA_SOURCE, schema);

So countings not my thing.

I the above line, schema is a File object. I haven't tried many other types of Objects but I know InputStreams don't work. I couldn't find any documentation on what types this property can accept.

Good luck.

dubwai at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 8
Thank You ;oP I will try this then and give some news...
javalova at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 9
How to get the code to work with javax.xml.parsers.DocumentBuilder and javax.xml.parsers.DocumentBuilderFactory?Doesn't matter DTD, XSD, as long as if there is a solution to avoid having user specify the DTD/XSD in the xml document.Thanks.
coconut99_99 at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 10

How do I force a SAX parser to use a fixed DTD (and ignore any DOCTYPE decl. in the xml doc) ?

I have been searching Java forums, apache website, the web, and SAX APIs for a way to do this but without luck. The usage is that I want to validate with a DTD located on the server, and wish to set my parser to always use that DTD, rather than requiring the user to declare a DOCTYPE declaration in the xml doc. The only solution I've seen so far is to actually search for a DOCTYPE decl in the xml, and replace it if it's there, with our own. This seems like a "hack" to me...

I have the EXACT problem.

But I need to use XMLReader and not JDOM.

For setting the XSD to point it to a external file, this is the code.

XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");

parser.setProperty(http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation, D:\xsd\today.xsd);

But I need the code for the DTD equivalent

Message was edited by:

srinivasang87

srinivasang87 at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 11

I believe what you want is an EntityResolver. It is called with the public and or private IDs of an entity being referenced. It returns an InputSource that will be used by the XMLReader to actually read what it wants. If you pass back an InputSource that is created from a StringReader with a single blank character, you can ignore the DTD all together.

If you know where your copy of "the right" DTD is, you can return an InputSource that referes to it and it should work as you wish.

Dave Patterson

d.patterson at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 12

Just wanna confirm if this is what you meant..

Here, the strings SystemId and PublicId are null initially.

I am setting it explicitly to point to a local DTD and calling the resolveEntity method of EntityResolver of my XMLReader.

Is this right?

String SystemId = this.iab.getImportSourceBean().getInputSource().getSystemId();

String PublicId = this.iab.getImportSourceBean().getInputSource().getPublicId();

SystemId = "D:\dtd\today.dtd";

try {

InputSource is = this.iab.getParser().getEntityResolver().resolveEntity(PublicId,SystemId);

isb.setInputSource(is);

iab.getParser().parse(isb.getInputSource());

srinivasang87 at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 13
I have the same problem:I want to use my custom locally DTD, either if a DTD is specified in DOCTYPE or it is not specified in the parsed file.I can not use EntityResover because resolveEntity() is not called if DOCTYPE is not specified.Thanks,Cristian
cruja at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 14
Your post helped me a lot with a project I am working on. Just thought I'd say thanks!
JonB_Calgary at 2007-6-29 9:14:29 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 15

I am having exactly the same problem and would appreciate any help. To recap here is what I want to do:

I want to validate XML files against a DTD I have created. The XML file will not have any reference to the DTD, or I want to ignore any DTD reference on the XML file

I found a way to do this with XML Schema, but I want to use DTD to do validation. If someone has found solution, can you please post a code snippet - I would prefer to use SAX, but DOM might be ok too

codecraker at 2007-7-1 1:44:19 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 16
Parse the XML in non-validating mode (thus not using any attached DTD to validate). Feed the result into an identity transformation, modified to attach your DTD to its output XML.Now you have an XML document that specifies the correct DTD. Parse it in validating mode.
DrClap at 2007-7-1 1:44:19 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...