Using StAX with xslt transformations in the right way?
Hi!
What do I need to enable the stax functionality to transformations, and which transformer implementations is supporting this? (or is the implementation irrelevant)
I have made the following to create a StaxSource, but is it enought?
8<
privatestatic XMLInputFactory inputFactory = XMLInputFactory.newInstance();
...
InputStream xmlInputStream = xmlUrl.openStream();
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader( xmlInputStream );
Source xmlSource =new StAXSource( xmlStreamReader );
...
transformer.transform(xmlSource,new StreamResult(writer));
8<
I'm using:
org.apache.xalan.processor.TransformerFactoryImpl
and every thing seems to work very nice, but I'm not sure if I have done it in the right way and if it's something that I miss.
If I understand it correct normal transformations is transforming the xml to a Dom-tree but with StAX it shouldn't and be more memory efficient.
So anyone have any comments?
/Per
# 1
Hmm.. After some more investigations it seems that the transformer I used didn't supported the StAXSource so I forced it to use:
8<
System.setProperty ("javax.xml.transform.TransformerFactory", "com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl" );
8<
And now it seems to be better StAX transformations, but my investigations continues.
/Per
# 2
> If I understand it correct normal transformations is
> transforming the xml to a Dom-tree but with StAX it
> shouldn't and be more memory efficient.
I don't think that is correct. To support XSLT properly, a transformer is almost always going to have to build a DOM tree. And feeding it the data via StAX versus an ordinary input stream isn't going to affect that requirement. Likewise if the transformer implements an identity transformation differently and decides not to build a DOM, that decision is still unaffected by how you feed it the data.
# 3
Indeed, as DrClap has already stated, using a StAXSource will not guarantee streaming. All of the mainstream XSLT processors build some sort of DOM structure internally because, in the general case, XSLT requires random access on the input document. The only exception to this is the identity transform, which in most processors is done in streaming fashion --i.e., without actually creating an intermediate structure.
# 4
Ok.
I havn't found any documentation about the different implementations and or the result of using StAXSource, but my impression is when tested the memory consumption with YourKit with ordinary apache and jdk1.6 implementation above is that the impementations is totaly different which make me suspect that there is some performance gains. I can see that the memory consumption is better and speed also but the it's a "mirobenchmark" with one small transformation.
But what exactly hapens under the hood is more than I know and understand.
In a projekt I working at now I have made the transformations to use StAXSource and using jdk1.6 transformation implementation and I can't say more than it's not so forgivven than apache.
Do you have any suggestions or pointers to where I could get more information?
/Perty
# 5
By Identity transformation do you mean using the <xsl:copy /> or anything else? Then why would this stream in a better way?It begins to feel like I'm not getting any further on this./Perty
# 6
When you create a Transformer object without specifying a Source, it defaults to doing an identity transformation. In other words, it copies its input to its output without altering it in any way.
When you say "I'm not getting any further on this" I am confused. Your choice of parser has no effect on what the Transformer does, so I don't know where you are trying to get to.
# 7
Sorry if I have been unclear, when Googling for "Stax transformation" you end up with for example this article: http://javaboutique.internet.com/tutorials/staxxsl/ wich gives the impression that just by using a StAX source you have a transformation based on StAX, which I have (maybe false) understod is using pullparsing instead of building a domtree of all the xml to be transformed.
My hyphothesis was by using the new StAX functionality in my transformations would give me speed and memory performence gains. But I havn't found any god description of the inner workings of the transformations build on StAX in java 6, and for the api doc http://java.sun.com./javase/6/docs/api/javax/xml/transform/stax/package-summary.html there only an TODO: ...
And using as I understand a identitytransform is not helping me, I have several xslt to transform realtime xml.
My goal was not to find the best transformation implementation out there, just to convert to StAX transformation as some articles describe it as a performance and memory gain.
But as I see it now I don't get that performence really...
Or am I wrong?
/Perty
# 8
I am not particularly impressed by that article. Here's a quote from the end of it:
"Obviously, the new java.xml.transform.stax package provides an improvement in of flexibility and new alternatives for the TrAX API. Working with the classes in this package is simple because they can be used just like other XXXSource/XXXResult classes. This facility is smooth and intuitive梱ou don't need an "external" XSLT processor, you don't need to write additional methods, you don't need to "fit" a helper class, etc. Because J2SE 1.5 doesn't provide this facility, you must find a solution that is worlds apart from the pattern designed by the StreamSource/StreamResult, SAXSource/SAXResult, DOMSource/DOMResult classes. The TrAX API is just the ticket."
Smells like rubbish to me. J2SE 1.5 doesn't provide WHAT facility? And all that business about not needing an external XSLT processor is equally true for SAXSource and DOMSource. You don't need to write extra code to use a StaxSource or a SAXSource or a DOMSource. Those things are part of TRAX anyway, not "worlds apart from" it. The paragraph is just incoherent. And it rather looks like the author confused TRAX and STAX at the end.
So, sure you can use a StaxSource to get data into your transformation. But I didn't see anywhere in that article where it even suggested any benefits from that. And saying that the transformation would be "based on" StAX is rather misleading; sure, the transformation uses a StAX parser to read the XML document, but basically transformations need to build an internal DOM because they just aren't amenable to processing a document in a single pass. So it uses StAX to read the document and it builds a DOM from the StAX events.
# 9
Reading that article once more give me the same impression. So, if I want to speedup or optimize my transformation I have to look for other transformation implementations.Thanks for your clarifications./perty