Counting words from HTML file excluding HTML tags

can any body please suggest me any code example, of counting words from HTML file. I have checked WordCount program available on this forum, but it also counts HTML tags. Please advise me --Ezee
[215 byte] By [Ezeea] at [2007-10-1 19:44:26]
# 1

Hi Ezee,

So from the following html-code:

<html>

<head>

<title>This is just () () just an example, is is @@is@ </title>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

</head>

...

etc.

You only want a word-count like this:

[b]WORD-OCCURRENCE[/b]

This-1

is-4

just-2

an-1

example -1

Correct? Or did you mean something else?

prometheuzza at 2007-7-11 16:04:53 > top of Java-index,Other Topics,Algorithms...
# 2
Get an HTML parser. It knows what tags are.%
duffymoa at 2007-7-11 16:04:53 > top of Java-index,Other Topics,Algorithms...
# 3
There are plenty of html strippers that are just a google away.
RadcliffePikea at 2007-7-11 16:04:53 > top of Java-index,Other Topics,Algorithms...
# 4

This is an easy problem.

You need to understand input. This can be easily managed using a FileInput class or a little more expertly done using BufferedReader. To understand input you need a simple understanding of Objects.

In addition you could do with understanding Loops and Selection.

Clues to help you are:

A tag comes in two types.

One with an opening and closing tag like Heading 1 tag or paragraph tag.

And another with a single tag like horizontal line or break.

You need to account for the two types and consequently ignore them to pay attention to the text within.

My Java lecturer would say, "First of all break down the problem into something more simple".

Create a simple html file that only uses open and close html tags. Create a very small html file of say only 5 lines. Get your program to deal with this. then add the other kind of tag.

If you're not sure about input yet, first write a program to read in a simple file that contains one single word.

Take it step by step.

stanton_iana at 2007-7-11 16:04:53 > top of Java-index,Other Topics,Algorithms...