Re: XML Chunk Equality from Elliotte Harold on 2004-10-26 (www-tag@w3.org from October 2004)

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Tue, 26 Oct 2004 17:36:28 -0400
To: Chris Lilley <chris@w3.org>
CC: Norman Walsh <Norman.Walsh@Sun.COM>, www-tag@w3.org
Message-ID: <417EC35C.7070807@metalab.unc.edu>

Chris Lilley wrote:

> ERH> What's probably intended here is that languages are compared case 
> ERH> insensitively within the ASCII range using English case mappings.
> 
> No; what is intended here is that *language tags* are compared case
> insensitively. xml:lang="en" and xml:lang="EN" denote the same language.
> Since the intent has clearly been misunderstood, the finding should be
> clarified to say 'language tags are ...'

I'm sorry. This is relevant. First of all, language tags should but do 
not have to be ISO 639 language tags. Although some early parsers were 
confused about this, xml:lang="Français" is well-formed.

Secondly, even if we stick to ASCII this is an issue. Consider 
xml:lang="it". This is the same as xml:lang="IT" when compared in an 
English locale but not when compared in a Turkish locale. In Java. 
"it".equalsIgnoreCase("IT") is *false* in Turkey.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Received on Tuesday, 26 October 2004 21:36:31 UTC