W3C home > Mailing lists > Public > www-style@w3.org > February 2009

Re: Unicode Normalization

From: Anne van Kesteren <annevk@opera.com>
Date: Thu, 05 Feb 2009 10:14:10 +0100
To: "Robert J Burns" <rob@robburns.com>
Cc: "Aryeh Gregor" <Simetrical+w3c@gmail.com>, public-i18n-core@w3.org, jonathan@jfkew.plus.com, "W3C Style List" <www-style@w3.org>
Message-ID: <op.uovfxwo964w2qv@annevk-t60.oslo.opera.com>

On Wed, 04 Feb 2009 22:07:59 +0100, Robert J Burns <rob@robburns.com>  
> [...] If you meant that XML is Unicode normalization agnostic in that it  
> doesn't care (or know?) whether two canonically equivalent strings are a  
> match then there I disagree with that. Unicode is fairly clear that two  
> canonically equivalent strings are equivalent even if their code points  
> differ.

That's what I mean. There are many different comparison algorithms.  
Unicode definitely does not make it non-conforming to compare two strings  
codepoint for codepoint. I'm not sure why you think it does.

>> The XML grammar is expressed in Unicode codepoints so comparison also  
>> happens on that level.
> However Unicode has a SHOULD requirement that two canonically equivalent  
> but codepoint differing strings match. Unicode's Chapter 3 (C6 norm)  
> says:
>> A process shall not assume that the interpretations of two canonical- 
>> equivalent character sequences are distinct.

I suggest to read all of C6. Martin Dürst already pointed out long ago  
that this does not always apply:


Anne van Kesteren
Received on Thursday, 5 February 2009 09:15:10 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:23 UTC