W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2009

Re: Unicode Normalization

From: Anne van Kesteren <annevk@opera.com>
Date: Thu, 05 Feb 2009 10:14:10 +0100
To: "Robert J Burns" <rob@robburns.com>
Cc: "Aryeh Gregor" <Simetrical+w3c@gmail.com>, public-i18n-core@w3.org, jonathan@jfkew.plus.com, "W3C Style List" <www-style@w3.org>
Message-ID: <op.uovfxwo964w2qv@annevk-t60.oslo.opera.com>

On Wed, 04 Feb 2009 22:07:59 +0100, Robert J Burns <rob@robburns.com>  
wrote:
> [...] If you meant that XML is Unicode normalization agnostic in that it  
> doesn't care (or know?) whether two canonically equivalent strings are a  
> match then there I disagree with that. Unicode is fairly clear that two  
> canonically equivalent strings are equivalent even if their code points  
> differ.

That's what I mean. There are many different comparison algorithms.  
Unicode definitely does not make it non-conforming to compare two strings  
codepoint for codepoint. I'm not sure why you think it does.


>> The XML grammar is expressed in Unicode codepoints so comparison also  
>> happens on that level.
>
> However Unicode has a SHOULD requirement that two canonically equivalent  
> but codepoint differing strings match. Unicode's Chapter 3 (C6 norm)  
> says:
>
>> A process shall not assume that the interpretations of two canonical- 
>> equivalent character sequences are distinct.

I suggest to read all of C6. Martin Dürst already pointed out long ago  
that this does not always apply:

   http://lists.w3.org/Archives/Public/www-style/2009Feb/0020.html


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>
Received on Thursday, 5 February 2009 09:15:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 5 February 2009 09:15:08 GMT