Re: [charmod-norm] Limitations of Normalization - Confusion

I like it, modulo one correction and one comment.

Correction: Please remove the reference to the DNS at the end.  First, "internationalized domain name system" is new terminology and could led to arguments about just what it means.  More important, while IDNA2003 attempted to make all small, roundish, sentence-terminators match the "dot" character that separates labels in domain names, one of the conclusions that led to IDNA2008 was that the attempt led to madness and could not be defined in a way that would be stable over time.  IDNA2008 is consequently written exclusively in terms of labels and makes no such association.

Comment: At the risk of digging up the bodies of dead wildebeest, I wish we could leave the notion of "confusable" as an apparent category of relationships out of this discussion.  "Identical" or "apparently identical" are fine, as are comments that people are easily confused and most of the discussion of similarity or apparent similarity among logical characters.    But we've seen claims that some rather extreme cases are confusable as well, e.g., "O" and "Q" (for someone unfamiliar with the script, the hook might be just decorative), "p" and "q" (perhaps it doesn't make a difference where the descender is located and perhaps U+006F U+0327 and/or U+006F U+0321  might be "the same as" at least one of those logical characters), and, absent context or in an unfamiliar context,  U+0602 might easily be mistaken for U+0055 in a sufficiently imaginative type style.

-- 
GitHub Notification of comment by klensin
Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/88#issuecomment-299662998 using your GitHub account

Received on Saturday, 6 May 2017 20:06:58 UTC