how to extract data in HTML source code via URL address

Hi everyone,

I am doing one project. It is possible to get the HTML source code via URL address. But i dont know how to extract data from these codes. For example, i want to extract data of the certain types, like name, location etc. attributes. how to extract these data? need help urgently, thanks in advance

[323 byte] By [mcaifkx2a] at [2007-11-27 6:20:12]
# 1
Are you just going to keep posting this question again and again until someone gives you some code?
georgemca at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 2
sorry for posting such post.i just want to know is there any way to extract data from HTML source code. Not the code, just a rough idea. i am totally a rookie in java
mcaifkx2a at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 3
How about first finding out how you would recognize the data you are looking for in the HTML?
CeciNEstPasUnProgrammeura at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 4
yes, you are right.i just need to extract some specific data, so that i can use these data to do analysis. Because the codes are all text, i really dont know how to recognize these specific data and then extract them
mcaifkx2a at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 5
Then how are we supposed to help you? There is no public String magicallyFindWhatIWant()method.
CeciNEstPasUnProgrammeura at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 6
i found some software that can extract data such as telephone number, email address, etc. via URL addressBut they dont extract data what i am looking for. how could they do this?
mcaifkx2a at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 7
> Then how are we supposed to help you? There is no > public String magicallyFindWhatIWant()> method.There isn't? There goes my Google-beating search engine, then.Though technically, any method named magicallyFindWhatIWant() ought to return a Pony.
MartinMa at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 8

> > Then how are we supposed to help you? There is no

> > public String magicallyFindWhatIWant()

> > method.

>

> There isn't? There goes my Google-beating search

> engine, then.

>

You missed Rene's sneakiness. He didn't say such a method didn't exist, only that it wasn't public :-)

georgemca at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 9
> You missed Rene's sneakiness. He didn't say such a> method didn't exist, only that it wasn't public :-)Ah. Only the cool kids get it? Just like the sodding ponies all over again :(
MartinMa at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 10

> yes, you are right.

> i just need to extract some specific data, so that i

> can use these data to do analysis.

> Because the codes are all text, i really dont know

> how to recognize these specific data and then extract

> them

What specific data are you trying to get, and what does it look like?

MartinMa at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 11
what i want to extract are data about his name, age, nationality, etc. information.
mcaifkx2a at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 12

> what i want to extract are data about his name, age,

> nationality, etc. information.

...yeah, got that part. What we need is something more specific.

Let's try an example. If you examine the source for this thread, you'll notice that it contains the following tag: <a href="profile.jspa?userID=896647"">mcaifkx2</a>

So, suppose I have your user name, and I want to get your userID. I'd have to search the source for something that looks like the opening tag, followed by your user name, followed by the closing tag. I'd then use my knowledge of the opening tag's structure to throw away the parts I don't need.

So what we need to know is how that name, age, nationality information is stored in the source. It's not enough just to know that it's there. What does it look like?

MartinMa at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 13

<td style="font-size: 92%; overflow: hidden;">

Nilson Munhoz %

<div><img src="http://img1.orkut.com/img/b.gif" alt="" height="4" width="1" /></div>male, 24, single

United Kingdom

</td>

these are just part of code,and they stored in HTML code like this.

mcaifkx2a at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...
# 14
Easiest way to do this is to load it into an XML parser and let it do the hard work of breaking the HTML into tags and attributes and text and so on. If your HTML isn't well-formed XML then there are HTML parsers that can do the same sort of thing.
DrClapa at 2007-7-12 17:35:13 > top of Java-index,Java Essentials,Java Programming...