W3C home > Mailing lists > Public > www-style@w3.org > February 2009

Re: Unicode Normalization

From: Robert J Burns <rob@robburns.com>
Date: Wed, 4 Feb 2009 16:10:16 -0600
Cc: "Aryeh Gregor" <Simetrical+w3c@gmail.com>, public-i18n-core@w3.org, jonathan@jfkew.plus.com, "W3C Style List" <www-style@w3.org>
Message-Id: <606DB044-C759-43F8-A4D7-671372713B3B@robburns.com>
To: "Anne van Kesteren" <annevk@opera.com>

Hi Brad,

Brad Kemper <brad.kemper@gmail.com>
> Sent from my iPhone
> On Feb 4, 2009, at 1:07 PM, Robert J Burns <rob@robburns.com> wrote:
>
> > However Unicode has a SHOULD requirement that two canonically
> > equivalent but codepoint differing strings match. Unicode's Chapter
> > 3 (C6 norm) says:
> >>>
> >>
> >>
> >
> >> A process shall not assume that the interpretations of two
> >> canonical-equivalent character sequences are distinct.
>
> Your interpretation adds something that your quoted text does not
> include. The quoted text does not include "but code point differing".
> It seems quite clear (at least when read in isolation from the rest of
> the spec) that its simply saying that two canonical-equivalent
> character sequences MAY not be distinct. If they are are not code
> point differing then they wouldn't be distinct. Otherwise they would
> be.

Certainly, there is something missing from the criterion there.  
However, your interpretation doesn't fill in that (I understand your  
on an iPhone, but I still need to point that out). In other words  
without the "adds something" in my interpretation, something else  
needs to be added to make sense of the Unicode C6 conformance norm.  I  
don't think we're interpreting this norm as saying that two  
canonically equivalent character sequences that are also code point  
equivalent character sequences are not unique. If that's all that  
criterion says, then why even mention canonical equivalence. The  
Unicode standard would simply say that "UAs must treat the equivalence  
of any character sequences the same as the code point equivalence for  
the underlying code point sequences".  There would be no need to  
mention canonical equivalence. In fact there would be no reason to  
even introduce the concept of canonical equivalence in the Unicode  
standard. Such an interpretation as the one you're proposing strains  
credibility for me. Granted there may be other interpretation than  
either you or I have offered, and I welcome hearing those, but that's  
not really a credible one.

Take care,
Rob
Received on Wednesday, 4 February 2009 22:10:57 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:16 GMT