- From: asmusf via GitHub <sysbot+gh@w3.org>
- Date: Thu, 04 Feb 2016 18:27:06 +0000
- To: public-i18n-archive@w3.org
This is a good point. In the discussion of identifiers (that is, accessing resources) it should be noted that normalization by itself is not sufficient to guarantee a unique appearance for each member of any pair of character sequences. In fact, canonical normalization is not primarily about appearance: it is about folding multiple ways of encoding "the same thing". Two graphemes can look the same, but not represent "the same thing". In that case, normalization would not fold them. Whether it is useful to go into any details on this, and so, which ones, is a matter of debate. Certainly, there are the instances of "apparent composition" that John mentions, but there are also the case of digraphs (not involving any combining marks). And finally, there are few examples of different letters having exactly the same shape (the three instances of capital D with the left stroke barred are a clear example of the phenomenon, even if IDNA2008 happens to avoid the problem, because it is lowercase only). (There's at least one lower case example, I'm leaving that as an exercise to the reader). -- GitHub Notification of comment by asmusf Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/69#issuecomment-179982077 using your GitHub account
Received on Thursday, 4 February 2016 18:27:11 UTC