Regular Expression
Hi all, I want to write a regular expression pattern to find any html beginning tag inside an html page, my knowledge with regex is what I learned in my SCJP preparation, and I think it is not enough.
Sure all of you know the generic syntax of html tags.
Thanks for help.
Ahmad Elsafty
# 1
Well you are talking about large set of possibility here, however html pages shouldn't be left out tags like following ...
".*<html>"
".*<html>.*<head>.*"
"<html> </html>"
or even
"<html>" will do
trial-and-error with "Google code search", see which set gives you the max results count, that should be the most suitable pattern.
If you are asking about the parsing of html, read the html languange definition should give u a better idea. (e.g. like HTML's (XML Document Type Definitions - DTD) kinda stuff, never saw it my self, but could be interesting)
Anyway Ahmad, are you building a search engine? (just a wild guess)
Cheers,
Avatar Ng