URLConnection Automatically applies XSL?

Hi,

I need to make a URLConnection to a remote web page and read it's contents. The contents of the web page I will be reading are well formed XML but the page has an xsl instruction at the top, like so:

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="/layout/xxx-sheet.xsl"?>

When I read the contents of the URLConnection to that page (HttpURLConnection) what I'm actually getting is thepost-transformation XHTML not the plain XML.

Is there a way to circumvent this behavior so that I get the contents of the web page pre-transformation? Either some attribute of the URL or URLConnection or do I have to write my own subclass of a URLConnection, ContentHandler, ContentHandlerFactory or something like that?

The only other way I can think of to solve this problem is to manually make a Socket connection on port 80 to the IP of the web site, then manually read/write the HTTP request/response. I really don't want/think I should have to get that low into the weeds in order to solve this problem.

Any tips/suggestions are appreciated.

[1231 byte] By [astricklanda] at [2007-11-27 10:43:28]
# 1

That suggests to me that the server detects the fact that your code isn't a browser, so for your convenience does the transformation itself before sending you the data.

DrClapa at 2007-7-28 20:00:34 > top of Java-index,Java Essentials,Java Programming...
# 2

Wow, I can't believe I didn't think of that.

I guess it is entirely possible that they are checking my client type to see if they can either insert the xsl-stylesheet attribute in the xml or if they have to do the transformation ahead of time for non standard browser clients.

Maybe they were thinking of mobile browsers, which may or may not support "in browser" xsl transformation, I don't know that much about mobile browsers.

It would make sense to take that approach...offload as much transformation workload to all browser clients that are capable of doing it in order to save themselves some processing.

Thanks for the insight. Any ideas on how I might get around that? Working with the company that is making the page is probably not an option...they're a big busy company and I don't work for them and am not a partner or consultant.

astricklanda at 2007-7-28 20:00:34 > top of Java-index,Java Essentials,Java Programming...
# 3

You could set the User-Agent header in your HTTP request to something that a browser would use. This is what my browser uses:

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4

DrClapa at 2007-7-28 20:00:34 > top of Java-index,Java Essentials,Java Programming...
# 4

That did the trick, thanks so much! You're insight was great!

Message was edited by:

astrickland

astricklanda at 2007-7-28 20:00:34 > top of Java-index,Java Essentials,Java Programming...