- From: klensin via GitHub <sysbot+gh@w3.org>
- Date: Wed, 06 Apr 2016 14:47:46 +0000
- To: www-international@w3.org
klensin has just created a new issue for https://github.com/w3c/charmod-norm: == Limitations of Normalization - Confusion == In addition to other issues covered by Rchard's notes,... (1) "Homoglyph" is not a generally-recognized term. Perhaps "called a homoglyph in Unicode documents" would be better. (2) The so-called non-decomposable problem, i.e., characters that can be formed by combining sequences in which all of the code points involved that are associated with particular scripts are members of the same script as each other and of the composite character but that do not have decompositions to at least one such combining sequence, probably deserves mention. AFAICT, UTC39 does not cover that set of cases, they can set a trap when users try to input characters ,of a script without quite the right keyboard, and they interact with language preferences in ways that I, at least, still don't fully understand. (3) The paragraph starting "Similar examples of identical appearance..." at least needs an example or two, whether the above is incorporated or not. As it is, it reads like hand-waving, especially for the cases UTS39 does not address. (4) In the last paragraph, starting "Finally, note that Unicode Normalization, even..." you might want to note that some systems do equate these characters by add-on steps to Normalization. IDNA2003 definitely did so. I haven't checked whether UTR46 still does but doing so would be consistent with its apparent principle of preserving everything that "worked in" IDNA2003. See https://github.com/w3c/charmod-norm/issues/88 Further comments on this issue will NOT be notified to this list. If you'd like to follow the discussion, please do so by subscribing to the issue via the above link. Do not reply to this email.
Received on Wednesday, 6 April 2016 14:47:59 UTC