InputTextArea - Converter

Hi,

Users are pasting text from Word documents into inputTextArea on my pages. Included in that text are single and double curly quotes.

After saving the field, they comeback as question marks.

How could I intercept and change those curly quotes into something I can save like straight single and double quotes.

I tried the custom converter but I don't know how to replace them. I have no clue what I'm suppose to look for in the string and what to replace it with. I tried string.replaceall() but what's the string I'm suppose to look ?

Thx in advance for your help.

[609 byte] By [Javaaaaaaa] at [2007-11-26 14:59:37]
# 1

You'd better to doublecheck and adjust the locale and charset settings on the JSF pages, the appserver and the database, etc.

If you really want to use a converter, well, develop a small JSF webapp with a converter, play somewhat with the converter code, find out the unicode code of the curly quotes and use it in string.replace(char, char).

I guess you mean those curly quotes:

?and ?br>

The unicode code of them both is \u201c and \u201d by the way. Also see http://en.wikibooks.org/wiki/Unicode/Character_reference/2000-2FFF

With this knowledge you can use for example:

string.replace('\u201c', '"');

BalusCa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 2

Thx for your help.

I can't really change anything on the DB side. The DB's been there for a long time and they won't do a conversion.

No real appserver, we're using Tomcat. All our pages are using charset iso-8859-1. I wrote my own converter and I'm trying to figure out how to convert the curly quotes into something listed in iso-8859-1 table. The thing is, when I look a the incoming String (in the debugger), the curly quotes are little squares (I'm guessing graphical characters).

So, how do I figure out what to use in the String.replaceAll(char,char)? By the way, the single quotes in string.replace('\u201c', '"'); are not allowed. They have to be double quotes string.replace("\u201c", "\"");.

If the incoming String is encoded in iso-8859-1 how do I replace an unrecognized character, which method should I use.

Thx in advance.

Message was edited by:

Javaaaaaa

Javaaaaaaa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 3

> So, how do I figure out what to use in the

> String.replaceAll(char,char)? By the way, the single

> quotes in string.replace('\u201c', '"'); are not

> allowed. They have to be double quotes

> string.replace("\u201c", "\"");.

I've never seen String.replace(String, String) in the API documentation ..

It is really [url=http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#replace(char,%20char)]String.replace(char, char)[/url].

Well, try to capture this one graphical character and get the unicode code from it. You can do this with Integer.toHexString(char). Then use this unicodecode in the String.replace(char, char) method.

BalusCa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 4
Thx BalusC,Integer.toHexString(char) returned: 91Now how do I replace that by a single quote?Thx
Javaaaaaaa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 5
string.replace('\u0091', '\'');
BalusCa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 6

Hmmmm.

This returnedString = string.replace('\u0091', 'A'); doesn't replace anything but

this one returnedString = string.replace("\u0094", "D"); does. Weird.

u0094 is double quote "

u0091 is single quote '

P.S: I used letters just to test that replacement occurred.

Any suggestion BalusC ?

Javaaaaaaa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 7
Are you running Java 5.0? I now see that it supports [url= http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)]String.replace(CharSequence, CharSequence)[/url].But does it work anyway?
BalusCa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 8

Here's the text I'm trying to clean-up:

慳lt=攃ontent敀

In the debugger, the first single quote is \u0091 (which is the annoying MS Word smart quote). When I try to convert it to a standard single quote, it doesn't.

returnedString = string.replace("\u0091", "A");

That line doesn't replace the smart quote by an 'A'. Nor those this one:

returnedString = string.replace("?, "A");

Don't give up on me now. :)

Javaaaaaaa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 9
I am using Java 5.
Javaaaaaaa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 10

I'm a dumb ***. I was replacing the text in the original String everytime like this:

returnedString = string.replace("\u0091", "A");

returnedString = string.replace("\u0092", "B");

returnedString = string.replace("\u0093", "C");

returnedString = string.replace("\u0094", "D");

And looking at returnedString .

That's why.

I really need one more coffee.

Thx for your help.

Javaaaaaaa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 11
yw :)
BalusCa at 2007-7-8 8:48:27 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...