java screen scrapper

Hello guys,

I am on the lookout for a screenscraper that gets specific pieces of data from web pages and formats the extracted data in xml.

I want something very simple and basic as i am under pressure to get this done over the next few days.

Any pointers will be much appreciated.

Thanks in advance.

[332 byte] By [Antananarivoa] at [2007-11-27 10:55:15]
# 1

> Hello guys,

Hello

> I am on the lookout for a screenscraper that gets

> specific pieces of data from web pages and formats

> the extracted data in xml.

A screen scraper? Or something that gets data from web and XML? Not the same thing, by a country mile

> I want something very simple and basic as i am under

> pressure to get this done over the next few days.

You're almost certainly stuck, then. What you're asking ain't simple. Ask for more time to do it in

By the way, I wouldn't bother urging people to help because you are under pressure. That's your own problem, and trying to get help quicker that way will have the opposite effect

georgemca at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 2

UrlConnection and your favorite XML builder? Why a screenscraper? That doesn't make sense to me.

CeciNEstPasUnProgrammeura at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 3

Thanks georgemc ,

I will take your kind suggestions into consideration next time.

Antananarivoa at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 4

> Thanks georgemc ,

>

> I will take your kind suggestions into consideration

> next time.

Ok lots of luck getting a screenscraper in Java, and running it through OCR software to extract data, in the next couple of days

georgemca at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 5

Thanks again georgemc.

Antananarivoa at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 6

> Thanks again georgemc.

Go boil your head. I did actually ask you some questions leading toward a possible solution, but you're more concerned about some perceived slight than you are about your problem, so lots of luck. Notice how nobody else is answering

This is a technical forum, not Manners Central

georgemca at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 7

> Notice how nobody else is answering

I did, but I don't feel like the OP cares to work on a solution with me either. I'll leave it to someone else.

CeciNEstPasUnProgrammeura at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 8

> > Notice how nobody else is answering

>

> I did, but I don't feel like the OP cares to work on

> a solution with me either. I'll leave it to someone

> else.

He doesn't want to work with anyone on a solution. He thinks someone is just going to post fully-working code he can lift and pretend to his employer that he did it himself. As usual. This "don't ask further questions, just answer mine" attitude is a daily occurence now. What a gyp

Although he didn't have a problem with me probing further last time I helped him with much the same problem. Odd

georgemca at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 9

Sorry, but i think i worded my question wrong which explains why i have received quite a few stinkers! i certainly wasn't expecting code but mainly pointers. Apologies if i offended anyone.

My problem is to get data from a web site and format it in XML. Jakarta taglib provides a mechanism to do it but mainly in JSP. And what's more the Jakarta taglib implementation isn't quite as fine grained in its data extraction algorithm.

Please, pointers will be appreciated.

Thanks.

Antananarivoa at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 10

> Sorry, but i think i worded my question wrong which

> explains why i have received quite a few stinkers!

You didn't get any stinkers. I was asking questions to clarify what you were after. You got the attitude in return to your sarcasm

> i

> certainly wasn't expecting code but mainly pointers.

> Apologies if i offended anyone.

Accepted

georgemca at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 11

> My problem is to get data from a web site and format

> it in XML. Jakarta taglib provides a mechanism to do

> it but mainly in JSP. And what's more the Jakarta

> taglib implementation isn't quite as fine grained in

> its data extraction algorithm.

> Please, pointers will be appreciated.

Huh? JSP has nothing to do with getting a webpage. JSPs are for building web pages (or other content such as XML).

Do you have access to the web server?

> java screen scrapper

(in a silly mood)

a) Using JDIC Browser component display the webpage (remember to go full screen)

b) Using java.awt.Robot take a screenshot of the webpage

c) Using ImageIO write the screenshot out a image file

d) Using ProcessBuilder and [url=http://jocr.sourceforge.net/]GOCR[/url] get the content of the webpage

e) Apply an XSLT over the OCR

(not in a silly mood)

URLConnection

The depressing thing is, I'm sure someone has done it the "silly way". I and think my company buys the data they produce. :yuck:

Message was edited by:

mlk

mlka at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 12

What about the JTidy API?

I think you can use JTidy API for converting HTML to XML then do search through XQuery what the particular data you wanna to get out of the screen scraping.

Hope it may helps

Ponmalara at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 13

Thanks a lot for all the ideas. Looks like i have a start point now!

Antananarivoa at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...
# 14

> Thanks a lot for all the ideas. Looks like i have a

> start point now!

You already had that starting point in reply 2.

CeciNEstPasUnProgrammeura at 2007-7-29 11:55:13 > top of Java-index,Java Essentials,Java Programming...