- From: Robert J Burns <rob@robburns.com>
- Date: Thu, 5 Feb 2009 04:11:03 -0600
- To: "Anne van Kesteren" <annevk@opera.com>
- Cc: "Aryeh Gregor" <Simetrical+w3c@gmail.com>, public-i18n-core@w3.org, jonathan@jfkew.plus.com, "W3C Style List" <www-style@w3.org>
Hi Anne, On Feb 5, 2009, at 3:14 AM, Anne van Kesteren wrote: > On Wed, 04 Feb 2009 22:07:59 +0100, Robert J Burns > <rob@robburns.com> wrote: >> [...] If you meant that XML is Unicode normalization agnostic in >> that it doesn't care (or know?) whether two canonically equivalent >> strings are a match then there I disagree with that. Unicode is >> fairly clear that two canonically equivalent strings are equivalent >> even if their code points differ. > > That's what I mean. There are many different comparison algorithms. > Unicode definitely does not make it non-conforming to compare two > strings codepoint for codepoint. I'm not sure why you think it does. I've elaborated extensively on why I think Unicode doesn't expect Unicode UAs to treat canonically equivalent strings as distinct. I'd be curious what your interpretation is of these Unicode criteria. Or even canonical equivalence. Why do you think the Unicode Standard even speaks of canonical equivalence and canonical decompositions. As I've said my interpretation is that it is because the process of assigning code points to the UCS has for one reason or another brought in redundant characters/graphemes. To cope with this, Unicode has created the concept of canonical equivalence where the characters with canonical decompositions should be considered equivalent (as in string comparisons) to their decomposed character sequence. If you think these criteria and concepts are about something entirely different, I wish you share your ideas with the list rather than simply expressing contrarian viewpoints. >>> The XML grammar is expressed in Unicode codepoints so comparison >>> also happens on that level. >> >> However Unicode has a SHOULD requirement that two canonically >> equivalent but codepoint differing strings match. Unicode's Chapter >> 3 (C6 norm) says: >> >>> A process shall not assume that the interpretations of two >>> canonical-equivalent character sequences are distinct. > > I suggest to read all of C6. Martin Dürst already pointed out long > ago that this does not always apply: > > http://lists.w3.org/Archives/Public/www-style/2009Feb/0020.html I have read all of C6 and I've provided my interpretation of that criterion in depth (more depth than some probably cared to hear). If your interpretation of these concepts is different, it would be helpful for you to layout those differences rather than just suggesting I go read the Unicode Standard. I have said that comparisons code point to code point are allowed in Unicode, but that I think the Unicode Standard implies that the process should have a reason for doing so rather than deviating from the ideal situation where an "implementation would always interpret two canonical equivalent character sequences identically". If you interpretation is different than mine than why don't you explain it here on this list. Take care, Rob
Received on Thursday, 5 February 2009 10:11:45 UTC