"Bad" markup for non-English language in JSF

Hello, I'm trying to solve the following problem with JSF. By the moment I didn't find answers, just questions about it in the forums, though everything seems extremely simple.

The problem is as follows.

My OS locale is russian (ru), codepage is wndows-1251. When I create jsf page, I am suggested windows-1251 encoding - that's OK too. Then I place

<h:outputText value="text_in_russian"/>. And run the page.

The resulting page is windows-1251 encoded - that's OK, but, if you look at the source code, though codepage is windows-1251, all russian symbols are encoded (escaped) like & # 1058;& # 1077;& # 1082;& # 1089;& # 1090; (i put spaces, as otherwise you'd see russian letters: Текст). As far as I understand, this means overrun of traffic (for example, page will be 200K instead of 50K), there can be problems with JavaScript and so on .

I specified default locale in faces-config.xml file, changed encoding and so on, but result was the same: everything looks fine, but escape sequences instead of letters in html code.

By the way, if I specify russian text, where it is not rendered, it is seen normally. But, if it is a rendered text, no matter, how - by outputText, inputText component or, for example, af:document or af:panelPage - sequences instead of letters in the source code. Using c:out JSTL tag results too results in "normal" text, the problem is just with JSF.

Is it JSF problem? Can it be corrected? If it is a problem (as far as I see, it's a wrong behaviour) - where can I write who can correct it?

Imagine situation I described with english letters: you place, say, <h:outputText value="outputText1"/> in your code and expect to receive something like outputText1, but receive & # xxxx; & # xxxx; & # xxxx;1, while it is possible to use 1 byte for letter!

And the question is russian language must not be special for windows-1251 encoding!

Waiting for your response,

Valeriy

[2054 byte] By [Valeriy.Yaldygina] at [2007-11-27 3:18:09]
# 1
Try putting escape="false" on your outputText tag.CowKing
IamCowKinga at 2007-7-12 8:20:46 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 2

Thank you, it really worked!

But now I've got new questions:

1. What can I do if I want to escape text?

2. What can I do with tags that don't have escape attribute, for example, h:inputText?

3. There are more complex tags in the other libraries that don't have such attribute, for example, in Trinidad project - tr:document, emitting <html> and other tags. What can I o with them?

4. isn't it strange that this characters are escaped?

Thanks in advance,

Valeriy

Valeriy.Yaldygina at 2007-7-12 8:20:46 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 3

1) I assume what you mean is that you'd want to escape any HTML, and script tags from the output, but not the Russian characters. Well, there's a couple ways. You can write a custom component and renderer that extends the outputText. The other way is to manually scrub all user input that you are going to output.

First, I would try to scrub the user data myself. If that wasn't sufficient, then you'll have to write a custom component.

You can use my HTML scrubber if you'd like. I use this whenever I output non-escaped user input:

/**

* Scrubs any HTML syntax from the given string and replaces it with

* an HTML safe version. For excample, a '<' character will be replaced with

* '& lt;'.

*

* @param string

* @return

*/

public static String scrubHTML(String string) {

//Get chars from given string

char[] chars = string.toCharArray();

//Create a string buffer to write to

StringBuffer sb = new StringBuffer(chars.length);

//Loop over each char in the given string

for(int i = 0; i < chars.length; i++) {

//Check for HTML syntax

if ((chars[i] > '%') && (chars[i] < '?')) {

if (chars[i] == '<') sb.append("& lt;");

else if (chars[i] == '>') sb.append("& gt;");

else if (chars[i] == '&') sb.append("& amp;");

else sb.append(chars[i]);

} else {

sb.append(chars[i]);

}//end check for HTML syntax

}//end loop over each char in given string

return sb.toString();

}

Please note that I accept no legal responsibility for the use of the above code. It's up to you to make your code secure. Also note that I had to put a space after the & to make the forum software happy. You'll have to take the spaces out to make it work!

2) h:inputText shouldn't need escaping. It will pass the String the user input as is. It's only the output components that require the ability to escape.

3) I am not familiar with this project. You may need to make a feature enhancement request to the owners. Or if it is OSS, you can make the changes yourself.

4) JSF escapes all special characters. I don't know why the JSF developers felt that they needed to escape the Russian characters. But from a security standpoint, better to be too safe than not quite safe enough. If you really want to dig into it, get the JSF source code and look at javax.faces.context.ResponseWriter.writeText. That's what JSF uses when outputting data to the response stream. API (for JSF 1.1_01):

http://java.sun.com/javaee/javaserverfaces/1.1_01/docs/api/index.html

CowKing

IamCowKinga at 2007-7-12 8:20:46 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 4

Thank you for your reply!

1. I meant that h:inputText too writes escaped characters, as the other components too, but it has no "escape" attribute that I can set to false. So, escaped characters will appear in any case? Are there another ways to un-escape them - maybe filters or some post-processing, or something else? Besides, your decicison means I refuse from some functionality of the h:outputText and write my own instead - globally, it seems something's wrong with JSF if I have to do it.

2. JSTL (and JSP) does not escape russian characters. I think, escaping is wrong behaviour. Maybe it's not intentional and it makes sense to start a question before JSF developers (though, I don't know how by the moment).

3. If no other ways to do it... how can I, for example, zip/gzip content before sending it too the user?

Thank in avance

Valeriy

Valeriy.Yaldygina at 2007-7-12 8:20:46 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 5

try this project: http://forum.exadel.com/temp/download/russian.zip

It contains war file and source code. Application supports two language - English and Russian. The languages are switched depends of the current locale set in the browser.

Remove en locale from the faces-config.xml to have always Russian interfaces.

To see and edit Russian text in the bundle file you can use the bundle editor of Exadel Studio Pro, for example.

Sergey.Smirnova at 2007-7-12 8:20:46 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 6

Thanks, Sergey.

I've tried this project.

It's about internationalization and bundles, not about the quality of the emitted html-markup. The resulting source HTML code looks like

<input id="helloForm:submit" type="submit" name="helloForm:submit" value="& # 1055;& # 1088; ... > and so on.

I think, such JSF implementation is wrong, and don't want my source code to look like this, though for users it's OK. After all, it's just text - why encode it? How to find you, JSF authors..? :-)">

Valeriy.Yaldygina at 2007-7-12 8:20:46 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 7
Yes, I can confirm that your input field in the browser's source should look like <input id="helloForm:submit" type="submit" name="helloForm:submit" value="& # 1055;& # 1088; ... >. Why do you care how the page looks like is the source?">
Sergey.Smirnova at 2007-7-12 8:20:46 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 8

Well, there are several reasons.

1. I wat to have graceful, not clumsy solution. Specifying appropriate codepage and using necessary symbols in that codepage is a common, proved technique, all (!) russian sites are built so, no questions on security (as far as I know) emerge, why complicate everything with escape sequences? Besides I don't like the idea to answer the questions of the other programmers like "I looked at your source code.. What's with it?"

2. Presenting russian symbols with escape sequences (& # 1055;& # 1088;) causes greater traffic.

3. Some slight, though solvable, problems with JavaScript are possible (for example, alert('<h:outputText value"#{msgs.ErrorMessage}">');

can do wrong with russian text).

I'm not alone in my trouble, the same question was discussed at other forums, no result, however...

Valeriy.Yaldygina at 2007-7-12 8:20:46 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...
# 9

Valeriy,

Sorry you are still having problems. I think you will have to develop custom components to generate source code exactly how you want it.

If you want to bring your concerns before the JSF developers, read the following page. There are links and suggestions on how to make contact with some of them.

https://javaserverfaces.dev.java.net/users.html

I couldn't find a bug reporting tool or anything like that.

Good luck on finding a better solution!

CowKing

IamCowKinga at 2007-7-12 8:20:46 > top of Java-index,Enterprise & Remote Computing,Web Tier APIs...