Re: Unicode Normalization thread should slow down; summary needed from Ambrose Li on 2009-02-11 (public-i18n-core@w3.org from January to March 2009)

From: Ambrose Li <ambrose.li@gmail.com>
Date: Wed, 11 Feb 2009 03:12:55 -0500
To: Henri Sivonen <hsivonen@iki.fi>
Cc: Robert J Burns <rob@robburns.com>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
Message-ID: <af2cae770902110012o5d0c2460q95b271752d280977@mail.gmail.com>

2009/2/11 Henri Sivonen <hsivonen@iki.fi>:
>
> On Feb 10, 2009, at 19:00, Robert J Burns wrote:
>
>> Having the example back helps dramatically. However, you've taken the
>> issue and boiled it down to the solved portion, ignoring what the thrust of
>> the thread was about.
>
> What was the thrust of the i18n core comments then? Except for your remarks,
> as far as I can tell, the thread has revolved around keyboard input order or
> differences in input methods between operating systems causing different
> code point sequences for same visual apperances.

Pardon my ignorance too, but this is complete news to me. As far as I
can tell the discussion was not "revolved around" input methods at
all. IME was part of the discussion, but in no way was the focus.

[...]
>> While most keyboards might be able to be designed to limit the input of
>> identifiers to canonically ordered character sequences, the problem is that
>> characters might be input by all sorts of means (not just keyboards):
>> including pasting, character palette, and keyboard input. An identifier
>> might begin its life from an innocent copy and paste from the document
>> content by the initial author of the identifier. Other subsequent authors
>> may try to match the identifier through keyboard input or character palette
>> input (perhaps unsuccessfully due to differing compositions and orderings).
>> So this is in particular a canonical normalization problem (though Henri has
>> attempted, but I'm afraid unsuccessfully, to restate in some terms of only
>> keyboard input).
>
> Has i18n core (or anyone else) identified copying and pasting as something
> that in workflows occurring in practice doesn't preserve identifier identity
> under the kinds of comparisons that are currently performed in the Open Web
> platform in general and in Selector implementations in particular?
>
> If the problem can be cornered to having to construct it with Character
> Palette to experience it, I'd be happy to invoke Solve Real Problems and
> declare such construction as not enough of a Real Problem to need Solving.
>
> (Seriously, if you get an HTML file from someone else and it has a class
> name with characters your that are foreign to your usual input method and
> you are tasked with writing a selector for the class names, do you copy and
> paste the string from the file you got or do you open the character palette
> and try to locate those characters visually there one by one?)

But that's irrelevant. It has been shown that even if the characters
are NOT foreign to you you will still not be able to tell the
difference.

For short strings (whether in Chinese or accented Latin like French or
German) I often retype them instead of doing a copy-and-paste. After
all, if you can see what it is (and this happens ONLY when the
characters are NOT foreign to you) and you can retype it easily, why
go to the trouble of moving the mouse and copy and paste (which,
oftentimes, take more time than retyping)?

-- 
cheers,
-ambrose

The 'net used to be run by smart people; now many sites are run by
idiots. So SAD... (Sites that do spam filtering on mails sent to the
abuse contact need to be cut off the net...)

Received on Wednesday, 11 February 2009 08:13:34 UTC