Regarding conversion of html to strings
Hi All,
I need a small help. I have a requirement to ignore html tags in strings. suppose if i have a string in that there are some html tags i want to ignore that html tag and print the result string.
String a=;    Need<b> dog training</b>
ignore all the html tags in the string and the resultant string must be
String a=Need dog training
please guide me how to achieve this. Is there any class or method which provides this facility.
I apologize for my poor English
Regards,
Rama
[559 byte] By [
RamaDevia] at [2007-11-27 10:21:59]

There is no function that I know of that will remove tags from your string.
When you consider that a tag can have any number of attributes that also have to be ignored in your output
(example: <a herf'=commands" />) you can't identify all possible tag variations ahead of time. Instead, you have to look for the beginning < and an ending >, ignore those characters, and all characters between them. The code below will accomplish this.
Note however, if the text body itself contains a < or >, it will not work correctly.
Note: in the last line below, I removed and '. You will have to add on additional replaceAll() functions to it to remove all the other special strings such as
String x1=" Need<b> dog training</b>";
String answer;
boolean withinTag=false;
StringBuffer str=new StringBuffer();
for(int ii=0;ii<x1.length();++ii){
char x2= x1.charAt(ii);
if(x2=='<'){
withinTag=true;
}
if(x2=='>'){
withinTag=false;
}
if(withinTag==false && x2!='>'){
str.append(x2);
}
}
anwser= str.toString().replaceAll(" ", " ").replaceAll("'"," " ));
System.out.println(answer);
Additional note: When I posted to this form, it removed & n b s p ; from
the first replaceAll(), first argument:
anwser= str.toString().replaceAll(" ", " ").replaceAll("'"," " ));
Just getting rid of HTML tags isn't that difficult: str = str.replaceAll("<[^<>]++>", "");
Those NBSP entities are a separate problem; you probably want to replace them with simple spaces so you can trim() them away if appropriate: str = str.replaceAll(" ?", " ").trim();
You may also need to normalize any remaining whitespace: str = str.replaceAll("\\s+", " ");