Display XML file contents in an HTML TextArea
Hi all.
I have a word document that was saved as an HTML file. But, this conversion has introduced the character encoding that Word has for certain special characters like ", ', etc. So, in the html file it gets printed out as \222 \223 etc. For ex. "An old book" in word doc comes out as \222An old book\223.
So, when I read from the html file and create an XML file. No prob. But when Im using an XSL file to extract info from the XML file and display it in an HTML form (textarea), the doc gets truncated at the point where it encounters these weird special characters.
Question: Does anyone know how to overcome this prob. and show the whole document as such.
Additional Information: \222 is a single character. Not "\" + "2" + "2" + "2".
Thanks for your help.
Arun
[824 byte] By [
arun78] at [2007-9-26 1:53:55]

There are a couple of issues here.
First, at some point you are going to need to represent the character codes in a way that xsl and a browser will interpret them for what they are. If this is a one time deal, I might suggest doing a replace all on the codes and replacing them with their escape values in xml.
The other issue is the truncation. This is a little confusing without actually being able to see what's going on. The character codes that you have listed are legal characters in xml. I would think that you would just transfer these uninterpreted codes to the text box. Although the backslash is the start for an escape character in JavaScript, so maybe your problem is really there. I mean, where you are trying to put the string in a TextArea.
I know that this doesn't give you a definitive answer to your dilemna, but hopefully it will point you in the right direction.
-Mike
Hi Mike,
Thanks for your suggestion. :-) I tried "encoding=iso-8859-2-1" instead of "encoding=UTF-8". And this time, I tried doing this "trial-and-error" kind of processing with a simple xsl file and another xml file. What happens is, when I used a InputStreamReader to read the Html file contents and create an XML file, it replaces those \222, \223 with something like 0 , - etc. And these things pass through the XSL processing.
When finally the output is thrown into the textarea of the HTML form, it replaces with the same or some other character, like ? , " etc. But thats OK.
So I beleive the way we read the HTML file while creating an XML from it and the encoding we use for processing the XML file makes the difference. :-)
Thanks again.
Arun