Detecting when files are not HTML when using a httpURLConnection/HTMLEditor
I have created a Spider application that can extract info from a web page using a combination of a httpURLConnection and HTMLEditorKit.ParserCallback
However every so often links to not html pages are found
ie. pdfs,zip files, excel spreadsheets etc...
Would anybody know how I can check the type of a downloaded page without rooting through the headers (I am not sure what is going on in there) ofrchecking the extention of the file, this would end up being a massive list of banned url extentions and would never be fully complete.
Thanks.

