- From: asmusf via GitHub <sysbot+gh@w3.org>
- Date: Wed, 06 Apr 2016 15:48:15 +0000
- To: public-i18n-archive@w3.org
(1) You can use Homograph, which covers cases where it's not a single "glyph". That term is a generic term simply meaning "written alike". It may not need the bold face, but it's a useful term to introduce. (2) The more I look at it, I find the non-decomposable problem may be somewhat of a red herring. Latest example I have come across: in Khmer, the two sequences U+17D2 U+178F and U+17D2 U+178A display absolutely identically (while standalone the characters differ significantly in appearance). This is not covered by the non-decomposable issue, because these are not composed vs. decomposed sequences. And it occurs in the same language using the same keyboard. (Aside: my best guess is that some constraint in the language doesn't allow an actual minimal pair of two words being identical except for that sequence, so the writing system can get away with re-using a form, but when typing, people want to type the letter (DA or TA) that corresponds to the actual sound. For identifiers, this opens the door to spoofing, unless some steps are taken to prevent the use of a minimal pair.) (3) I agree. An example of a non decomposable, for example 0781, the example I just gave, an example of a digraph and the example of Latin turned e (I think that's the name) would be good to map the nature of the problem. (4) Besides additional transformation there are other steps that can be taken, depending on the protocols involved. Where identifiers are registered, the registration of one can be made to cause the other (homograph one) to be blocked from registration. -- GitHub Notification of comment by asmusf Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/88#issuecomment-206437324 using your GitHub account
Received on Wednesday, 6 April 2016 15:48:17 UTC