converting html to text

Hello i have a class that contain a string. The string has some html code in it so i would like to translate this html code to text that is readable. The string contains html tag. How can i convert this ?thanks
[232 byte] By [JusteUneQuestiona] at [2007-11-27 0:15:34]
# 1

You have a couple of choices. One is regular expressions. The other is to create a parser that recognizes and assigns meaning to tokens in the html text.

I would suggest doing a google search on Java + Regular Expression.

If you do that, the second link in the list is this one. http://java.sun.com/docs/books/tutorial/essential/regex/

Start reading.

PS.

puckstopper31a at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 2
Hi,Remove the < > from the tag. String has lot of methods. Use split and startswith methods to remove the <> or use regular expression.bye for nowsat
AnanSmritia at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 3
Ok thanks for this, i m starting reading.
JusteUneQuestiona at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 4
If this is for a GUI, then many Swing components do this for you.
CaptainMorgan08a at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 5

> Hi,

>

> Remove the < > from the tag. String has

> lot of methods. Use split and startswith methods to

> remove the <> or use regular expression.

>

> bye for now

> sat

Hello you mean that if i remove only the <> from

it work like a carriage return ?

JusteUneQuestiona at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 6
> If this is for a GUI, then many Swing components do> this for you.It's text to display in jsp
JusteUneQuestiona at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 7
> Hello you mean that if i remove only the <> from > it work like a carriage return ?No, it will not.
KathyMcDonnella at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 8
> > Hello you mean that if i remove only the <> from> > > it work like a carriage return ?> > No, it will not.and can i have the carriage return using regexp ?
JusteUneQuestiona at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 9

Don't bother, unless you know the ONLY html tag is

.

With regular expression, you have to list each TAG you want to replace,

and list what you replace it with.

Regular expressions do not magically turn HTML into plain text.

Note, as already suggested by others, if the string LITERALLY

starts with the 6 characters "<html>", then most Swing text components (JTextField, JTextArea...)

will render it as HTML (you know, with bold, itatic, hyperlink, linebreak, etc.)

Java 1.5 only supports up to about HTML 3.x, and has very

poor support for CSS. I don't know if the situation improves in Java 1.6 or not.

>

> It's text to display in jsp

>

Yike! I didn't notice you wrote this.

Oh, okay. So it is text that will be printed by jsp into a user's browser.

In this case... then... what's the problem?

will SHOW UP AS a line break when the user visits your JSP page.

<b>abc</b> will SHOW UP bold when the user visists your JSP page.

etc.

KathyMcDonnella at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 10

> Don't bother, unless you know the ONLY html tag is

>

.

> With regular expression, you have to list each TAG

> you want to replace,

> and list what you replace it with.

> Regular expressions do not magically turn HTML into

> plain text.

>

> Note, as already suggested by others, if the string

> LITERALLY

> starts with the 6 characters "<html>", then most

> Swing text components (JTextField, JTextArea...)

> will render it as HTML (you know, with bold, itatic,

> hyperlink, linebreak, etc.)

>

> Java 1.5 only supports up to about HTML 3.x, and has

> very

> poor support for CSS. I don't know if the situation

> improves in Java 1.6 or not.

The string doesnt start with "<html>" and its not for using with swing.

JusteUneQuestiona at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 11

>

> It's text to display in jsp

>

Yike! I didn't notice you wrote this.

Oh, okay. So it is text that will be printed by jsp into a user's browser.

In this case... then... what's the problem?

will SHOW UP AS a line break when the user visits your JSP page.

<b>abc</b> will SHOW UP bold when the user visists your JSP page.

etc.

KathyMcDonnella at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 12

> Oh, okay. So it is text that will be printed by jsp

> into a user's browser.

> In this case... then... what's the problem?

>

will SHOW UP AS a line break when the user

> visits your JSP page.

> <b>abc</b> will SHOW UP bold when the user visists

> your JSP page.

> etc.

I wish it was working like this but it doesnt that is why i m posting.

I have a class that has a string, then i m getting the string and get it in my jsp but on the browser its showing like it is written in the class and not with a carriage return.

JusteUneQuestiona at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 13

You need to provide more info.

If it is emitted in a normal context, then

will show up as line break by the browser.

But if it is emitted inside a <pre>..</pre>, then

will show up as

. Etc.

Also, you can DEBUG this step-by-step.

First, use your browser, click "View Source", and see what your JSP is spitting out.

If it is spitting out

, then it SHOULD show up as line break.

I suspect it is spitting out & lt ; br & gt ; instead (due to how your code prints it...).

If that is the case, then you go 1 step in, and look at the printing code, etc. etc.

etc.

KathyMcDonnella at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 14

[nobr]In my class :

objectif.setIntroductionVisuelGeneral("Le prix des obs鑡ues varie en fonction de la nature des prestations, le co鹴 moyen est d'environ 2 700 €. <br>A cela, s'ajoute le prix de la concession qui varie en fonction de l'emplacement du caveau, de son nombre de places, de sa surface et de la dur閑 de la concession (entre 700 et 7 000 €).");

In my jsp, with struts tag :

<tr>

<td>

<bean:write name="objectifSelectionne" property="introductionVisuelGeneral"/>

</td>

</tr>

I' m having problems with the

wich shows up in the browser as

and with the euro sign wich shows up as ?

I tried to replace the

and euro sign in my string with some ascii hexadecimal characters but its not working so i dont know how can i can i do to convert it[/nobr]

JusteUneQuestiona at 2007-7-11 22:02:35 > top of Java-index,Java Essentials,Java Programming...
# 15

I asked you already: what does it output in the browser?

Use your browser, and click View Source, then paste the HTML text

corresponding to "Le prix des obsques varie en fonction de

la nature des prestations, le cot moyen est d'environ 2 700 .

A cela, s'ajoute le prix de la concession qui varie en fonction

de l'emplacement du caveau, de son nombre de places, de sa

surface et de la dure de la concession (entre 700 et 7 000 )"?

KathyMcDonnella at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 16
Here is the output : La disparition d'un membre de la famille peut dstabiliser compltement son quilibre financier. Savez vous quels seraient les revenus de votre foyer si vous tiez amen dcder ? <br>
JusteUneQuestiona at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 17

You're still misunderstanding me.

Let me be much more explicit:

1) Load your web browser.

2) Use the browser to visit your JSP page

3) You see that the browser is showing "

". Bad!

4) Right click on the page

5) Click ViewSource

6) You should see a LARGE page of text, like

<html>

<head> ... </head>

...

...

TheMessage with

in it

...

</html>

7) Now, for SOME reason, your browser REFUSE to show line break.

The reason is IN the html code.

Look at it.

Remember, the browser DOES NOT care where the html came from.

It could have come from a static page, or from a JSP/ASP/PHP, it doesn't matter.

Something in the HTML is INCORRECT, and it causes

to show up as

.

8) So, figure it out. Once you figure out what's wrong with the HTML,

you can go back, and see which line corresponds to it, which function

and which setting caused it. Then you can start your real debugging.

KathyMcDonnella at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 18
my is showing like this <br>
JusteUneQuestiona at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 19
Ah!!!!!!!!!!!Do as I say, please!!!Edit: Something IS wrong with the html.View its source!Figure it out!
KathyMcDonnella at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 20

By default, <bean:write> encodes characters that are considered "special" in html. Add the attribute filter="false"

<bean:write name="objectifSelectionne" property="introductionVisuelGeneral" filter="false"/>

dnathansona at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 21
I m looking at it but i dont see anything wrong in the html
JusteUneQuestiona at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 22

>

> By default, <bean:write> encodes characters that

> are considered "special" in html.

>

Ah, that sounds like it's exactly the source of the problem.

It will replace < with & lt ;

And replace > with & gt ;

And it is something that the OP could have confirmed

by ViewSource of the webpage, and see & lt ; br & gt ; instead of

(just like I said!!!)

This OP is driving me crazy.

KathyMcDonnella at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 23
> I'm looking at it but i dont see anything wrong in the htmlReally? Does it show & lt ; br & gt ; or does it show
KathyMcDonnella at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 24
Yes it shows & lt ; br & gt ; that is what i said in previous post. So now its working with filter="false" for my but not for euro sign
JusteUneQuestiona at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 25
> euroAre you setting the right charset encoding of the output page?
KathyMcDonnella at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...
# 26
Ok it must be this. Thanks for your help and patience Kathy
JusteUneQuestiona at 2007-7-21 19:41:15 > top of Java-index,Java Essentials,Java Programming...