Re: XML Chunk Equality from Elliotte Harold on 2004-10-27 (www-tag@w3.org from October 2004)

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Wed, 27 Oct 2004 06:49:15 -0400
To: Chris Lilley <chris@w3.org>
CC: Norman Walsh <Norman.Walsh@Sun.COM>, www-tag@w3.org
Message-ID: <417F7D2B.6000109@metalab.unc.edu>

On a side note, I'm not sure what programming language you prefer, but 
there's a very real issue here in Java to be aware of when writing this 
finding, or otherwise this will cause 90%+ of Java implementations to 
get the algorithm exactly backwards.

The apparently non-Locale sensitive methods such as the no-args versions 
of toUpperCase and toLowerCase behave according to the current locale 
rather than according to the Unicode case folding tables or any other 
fixed mapping. This means a lot of Java code breaks in Turkey when it 
uses the apparently locale insensitive methods.

To get true locale-insensitivity when comparing syntactic strings, in 
Java it is necessary to specify a locale. Counter-intuitive but true. 
The only way to get the behavior one wants is to say 
toUpperCase(Locale.EN) instead of just toUpperCase().

This is why I think adopting an explicit algorithm such as match a-z 
with A-Z, and don't change anything else is more likely to be 
implemented correctly than a mere statement that "Languages are compared 
case insensitively."

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Received on Wednesday, 27 October 2004 10:49:18 UTC