Re: [charmod-norm] 2.2.1 Canonical vs. Compatibility Equivalence vs Canonical non-equivalence

On 2/6/2016 5:45 PM, aphillips wrote:
>
> So... I'm aware of this. I suppose we probably should mention, for 
> example, something like the "Paypal" bug, e.g, U+03A1, U+0420, and 
> U+0050 (P) all look absolutely identical but are unrelated.
>
> I have changed the Section 2.2 introduction to be slightly more 
> technically accurate. I also added a Note Well box.
>
> @asmusf <https://github.com/asmusf> I partially copied your reply 
> above to form part of the note. (I also added you to the 
> acknowledgements list).
>
> Please consider the changes and see if these address the problem.
>
> —
> Reply to this email directly or view it on GitHub 
> 
<https://github.com/w3c/charmod-norm/issues/69#issuecomment-180912036>.
>
You write:

    Obviously, "confusable"characters like this can present spoofing 
and
    other security risks.For more information, see [[UTR39]].


First, the example in this case is not merely about characters that 
are 
"confusable" - a term that encompasses a wide spectrum of similarity 
under a wide variation of circumstances and involving assumptions 
about 
human perception - but it is more precisely about characters being 
"homoglyphs" (in fact, strict homoglyphs, with an appearance that is 
identical in all practical scenarios).

The distinction matters, because those that feel that normalization 
"should" have addressed certain issues are fine with "mere" 
similarities 
handled differently.

Second, the example may be "famous" but involves only a subset of 
homoglyphs. There are some other examples of homoglyphs that are in 
the 
same script. I don't think you need to give examples of both; but it 
would not be amiss to add that this effect does not require separate 
scripts.

My suggested replacement:

==> Similar examples of identical appearance also exist within a 
single 
script. Because these characters have with an appearance that is 
identical for all practical purposes they are an extreme manifestation
 
of "confusable" characters, which can represent...

(Or you can find a way to break that sentence in two).

A./




-- 
GitHub Notification of comment by asmusf
Please view or discuss this issue at 
https://github.com/w3c/charmod-norm/issues/69#issuecomment-180928288 
using your GitHub account

Received on Sunday, 7 February 2016 03:20:02 UTC