Re: Unicode Normalization

On Wed, 04 Feb 2009 22:07:59 +0100, Robert J Burns <rob@robburns.com>  
wrote:
> [...] If you meant that XML is Unicode normalization agnostic in that it  
> doesn't care (or know?) whether two canonically equivalent strings are a  
> match then there I disagree with that. Unicode is fairly clear that two  
> canonically equivalent strings are equivalent even if their code points  
> differ.

That's what I mean. There are many different comparison algorithms.  
Unicode definitely does not make it non-conforming to compare two strings  
codepoint for codepoint. I'm not sure why you think it does.


>> The XML grammar is expressed in Unicode codepoints so comparison also  
>> happens on that level.
>
> However Unicode has a SHOULD requirement that two canonically equivalent  
> but codepoint differing strings match. Unicode's Chapter 3 (C6 norm)  
> says:
>
>> A process shall not assume that the interpretations of two canonical- 
>> equivalent character sequences are distinct.

I suggest to read all of C6. Martin Dürst already pointed out long ago  
that this does not always apply:

   http://lists.w3.org/Archives/Public/www-style/2009Feb/0020.html


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Received on Thursday, 5 February 2009 09:15:10 UTC