Re: [charmod-norm] 2.2.1 Canonical vs. Compatibility Equivalence vs Canonical non-equivalence

This is a good point. In the discussion of identifiers (that is, 
accessing resources) it should be noted that normalization by itself 
is not sufficient to guarantee a unique appearance for each member of 
any pair of character sequences. 

In fact, canonical normalization is not primarily about appearance: it
 is about folding multiple ways of encoding "the same thing". Two 
graphemes can look the same, but not represent "the same thing". In 
that case, normalization would not fold them.

Whether it is useful to go into any details on this, and so, which 
ones, is a matter of debate. Certainly, there are the instances of 
"apparent composition" that John mentions, but there are also the case
 of digraphs (not involving any combining marks). And finally, there 
are few examples of different letters having exactly the same shape 
(the three instances  of capital D with the left stroke barred are a 
clear example of the phenomenon, even if IDNA2008 happens to avoid the
 problem, because it is lowercase only). (There's at least one lower 
case example, I'm leaving that as an exercise to the reader).

-- 
GitHub Notification of comment by asmusf
Please view or discuss this issue at 
https://github.com/w3c/charmod-norm/issues/69#issuecomment-179982077 
using your GitHub account

Received on Thursday, 4 February 2016 18:27:11 UTC