Howto detect if any non-western letter in string.

Hi;What is a quick function that can check if a string contains any non western characters?Seems to me that its part of j2se, but I can't seem to find it.I need to use it in a tight loop, so efficiency counts as well.Thank you much;-nat
[279 byte] By [nat0a] at [2007-10-2 5:04:23]
# 1
you could maybe do this by doing:String s = ...char[] chars = s.getCharArray();foreach (chars c) if (c > 255) // then it's non-ascii
tjacobs01a at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 2
Thats good, thanks. But I was hoping that there was a built-in function "hasUnicode" or "isAscii". I recall reading about something like this, but maybe it was 3rd party api.-nat
nat0a at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 3

The definition of your question is somewhat lacking. How do you define "western" in that context? Do you mean only characters contained in ASCII? Then checking wether your char is < 128 would suffice, as ASCII contains only 128 chars and is identical to the lower 128 chars of unicode. But if accented characters that are not part of ASCII (like german umlauts ? ? ?or accented characters from french (which I don't know how to type on my keyboard)) are part of your definition of "western" then it'll get much harder.

JoachimSauera at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 4
It's the 128 ascii set, yes, but I would like to do it without looping and checking each character for its ascii value. I was hoping that there was a single boolean function call [against the string] that would return true or false. (I think I once saw this in *some* api.)-nat
nat0a at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 5
You could use the "matches" method in String (assuming Java 1.4 or 1.5). This checks whether your String matches a given regular expression.
MLRona at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 6

> You could use the "matches" method in String

> (assuming Java 1.4 or 1.5). This checks whether your

> String matches a given regular expression.

That's a good idea. But likely slower than looping chars. Speed, in this case, is the most important.

Thanks;

-nat

nat0a at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 7
You've got to check every character. There's no getting around this
tjacobs01a at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 8
> You've got to check every character. There's no> getting around thisTo amplify on that:Regardless of whether you use a loop that someone else wrote and packaged in a method, or you write it yourself.
ChuckBinga at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 9
http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/text/CharsetDetector.html http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/text/CharsetMatch.html
PaulJHa at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 10
Icu4j is a great api, but for this I think that a char loop would be quicker.I was hoping that j2se sets a flag while building the internal array, but I guess not.Thank you all.-nat
nat0a at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...
# 11
An internal "ContainsNonWesternCharacters" flag in the String class? I think somebody is taking their requirements a little too seriously.
DrClapa at 2007-7-16 1:07:54 > top of Java-index,Desktop,I18N...