I believe that java.net.HttpURLConnection will do this for you already.
And there are probably other libraries that do this. I'd check the jakarta HttpClient library.
Apart from that...I believe the HTTP spec for this is pretty simple. A blank line separates headers from the body. Headers are key/value pairs, where multiple pairs are sometimes permitted, and the values can be lists, etc.
Right I already know all of this. First the HTTPURLConnection doesn't have all the features that I want. Second I have already tried creating a parsing routine and for some reason I just can't come up with one that works every time, right now I have...
sb is a StringBuilder
in is a DataInputReader
while ((rdata = in.readLine()) != null && !doneReadingHeader) {
sb.append(rdata);
sb.append("\n");
//If there is a newline?
if(rdata.length() < 1) {
doneReadingHeader = true;
}
}
What features does HttpURLConnection lack? It might be easier to wrap one of those than to rewrite it from scratch.
All your code does is look for a newline, and append to a StringBuilder. It's not clear to me why you're doing the latter; you're trying to parse the headers, aren't you? So why undo the parsing you've done so far? Keep in mind that not only are you undoing the tokenization of lines; you're actually messing up the data because readLine strips newlines, and so the sb.append is creating a big mass of text without the line separators.
I think a link to the previous thread would be prudent http://forum.java.sun.com/thread.jspa?threadID=5172539
I also do not know why you don't want to use HttpURLConnection, you never have given a reason for that, that I can see. Can you expand on that? Because this seems like a great deal of energy to expend re-inventing the wheel....
Anyway, some other ideas...
trim the line before you look at the length. Possibly also look for other header related info. Like after the initial HTTP message (with the response code) I believe that ever HTTP header link must have at least one colon. (I am fairly certain of this.) So you could look to see if the line has a colon.
Also...note that it's possible that the server is gzipping its output because it knows Firefox can handle it. If you use a different user agent string, maybe the server won't even send you gzipped output. (This is re: the other thread.)
(Thanks for posting that btw cotton.) Also it's possible that the server is looking at the Accept HTTP request header.
The main reason I don't want to use a HttpURLConnection is because:
1. It is slower and bloated for what I want to do, I just want a very simple way to do what I want to do and only what I want to do.
2. To my understanding it doesn't support persistent connections.
3. I like making something lower level because I can control more.
> The main reason I don't want to use a
> HttpURLConnection is because:
> 1. It is slower and bloated for what I want to do, I
> just want a very simple way to do what I want to do
> and only what I want to do.
Yes... simple.
As far as the performance issue goes. You have some data to support this assumption?
I use HttpURLConnection all the time for application purposes and I certainly don't find it to be an application bottleneck.
At any rate I would give this up and go make the rest of your program work first. Then later if you decide that this is a major problem for you go back and address it.
I suspect the real issue is that you are simply trying to get massive amounts of data and that's where your bottleneck is. I could be wrong.
Ok yes I already have a class using the HttpUrlConnection that handles GZIPed pages. But I want to make my own wrapper that does the same thing just faster, and yes it might not be all that faster but every millisecond counts...
Data: Just ran a test with Socket2(The wrapper using a socket), and Socket(The wrapper using a HttpURLConnection). 10 requests to www.yahoo.com, and calculate the average of the connection time between the sockets...
Socket2: 0.23252920000000002
Socket: 0.30110169999999997
And I think that speed increase is because it doesn't have to recreate the connection every time.
> Ok yes I already have a class using the
> HttpUrlConnection that handles GZIPed pages. But I
> want to make my own wrapper that does the same thing
> just faster, and yes it might not be all that faster
> but every millisecond counts...
>
If this is the case then I would suggest to you that HTTP is not the way to go. This gets back to the point in my previous post about where your bottleneck is.
If this is so critical I think you should be revisting your overall concept here.
> And I think that speed increase is because it doesn't
> have to recreate the connection every time.
I seem to recall reading somewhere that HttpURLConnection manages multiple HTTP connections over a single open socket. I'm not convinced the data shows what you think it does or means what you think it means.
> Well basically its a sniping program
Well my ethical problem with this aside I don't think this is your performance bottleneck and really not worth worrying about.
Better to have well tested robust code provided by Sun then what you are writing.
Your bottlenecks will come in creating the content to send back but moreover there will be delays and hiccups due to vagaries in internet traffic and what is happening at the site you are "sniping" anyway so at the end of the day I really just don't think this is worth worrying about at all.
And if you really believe otherwise I would inclined to tell you to give up in Java and go right this in C++ or something where you can get lower in the TCP scheme of things and you will be able to extract better peformance then. Although again I believe the advantages would be negligable and largely irrelevent.
Wow.
Not to be mean, but if you're trying to tune your online auction bidding down to the millisecond, then maybe it's time that you ask yourself whether you really need that rare collectible Pez dispenser.
Anyway...back to the issue at hand...
First, I'd advise fiddling with the parameters in the request to see if you can prevent the site from sending you gzipped data to begin with. If so that would simplify the problem significantly. Although, you (the OP) are probably worried that you'll lose precious milliseconds that way. But keep in mind that you'll gain a bit of the time, since you and the server won't have to zip/unzip.
If that's not an option, you'll need read newlines in the input stream to look for headers and the end of headers. Keep in mind that the input will be text for the headers, but binary for the body, which makes things a pain.
Etc.
Actually I still suspect that the real improvements to the program would be in making it smarter, not by shaving off milliseconds in the HTTP connections.
I've done this stuff before - i.e., taking only the body part of the HTTP message.
HTTP structure is very simple than I had thought. The indicator that will tell you that you're about to begin reading the body of the HTTP message is simply a CRLF [blank line].
That's clearly specified in the HTTP 1.0 Protocol Specification. I've done an application that listens on the network and capture data packets. All these are HTTP protocol and was only interested in the body of the message for further processing.
If you have an input stream, you should be reading in bytes and I suppose you'll already know the byte-equivalent of the CR+LF to test against HTTP message.
Check this website for further info on HTTP structure: http://www.jmarshall.com/easy/http/
HTH.