how to extract data in HTML source code via URL address
Hi everyone,
I am doing one project. It is possible to get the HTML source code via URL address. But i dont know how to extract data from these codes. For example, i want to extract data of the certain types, like name, location etc. attributes. how to extract these data? need help urgently, thanks in advance
[323 byte] By [
mcaifkx2a] at [2007-11-27 6:20:12]

Are you just going to keep posting this question again and again until someone gives you some code?
sorry for posting such post.i just want to know is there any way to extract data from HTML source code. Not the code, just a rough idea. i am totally a rookie in java
How about first finding out how you would recognize the data you are looking for in the HTML?
yes, you are right.i just need to extract some specific data, so that i can use these data to do analysis. Because the codes are all text, i really dont know how to recognize these specific data and then extract them
Then how are we supposed to help you? There is no public String magicallyFindWhatIWant()method.
i found some software that can extract data such as telephone number, email address, etc. via URL addressBut they dont extract data what i am looking for. how could they do this?
> Then how are we supposed to help you? There is no > public String magicallyFindWhatIWant()> method.There isn't? There goes my Google-beating search engine, then.Though technically, any method named magicallyFindWhatIWant() ought to return a Pony.
> > Then how are we supposed to help you? There is no
> > public String magicallyFindWhatIWant()
> > method.
>
> There isn't? There goes my Google-beating search
> engine, then.
>
You missed Rene's sneakiness. He didn't say such a method didn't exist, only that it wasn't public :-)
> You missed Rene's sneakiness. He didn't say such a> method didn't exist, only that it wasn't public :-)Ah. Only the cool kids get it? Just like the sodding ponies all over again :(
> yes, you are right.
> i just need to extract some specific data, so that i
> can use these data to do analysis.
> Because the codes are all text, i really dont know
> how to recognize these specific data and then extract
> them
What specific data are you trying to get, and what does it look like?
what i want to extract are data about his name, age, nationality, etc. information.
> what i want to extract are data about his name, age,
> nationality, etc. information.
...yeah, got that part. What we need is something more specific.
Let's try an example. If you examine the source for this thread, you'll notice that it contains the following tag: <a href="profile.jspa?userID=896647"">mcaifkx2</a>
So, suppose I have your user name, and I want to get your userID. I'd have to search the source for something that looks like the opening tag, followed by your user name, followed by the closing tag. I'd then use my knowledge of the opening tag's structure to throw away the parts I don't need.
So what we need to know is how that name, age, nationality information is stored in the source. It's not enough just to know that it's there. What does it look like?
<td style="font-size: 92%; overflow: hidden;">
Nilson Munhoz %
<div><img src="http://img1.orkut.com/img/b.gif" alt="" height="4" width="1" /></div>male, 24, single
United Kingdom
</td>
these are just part of code,and they stored in HTML code like this.
Easiest way to do this is to load it into an XML parser and let it do the hard work of breaking the HTML into tags and attributes and text and so on. If your HTML isn't well-formed XML then there are HTML parsers that can do the same sort of thing.