Re: Changes to DOM3 Events Key Identifiers from Giovanni Campagna on 2009-10-31 (www-dom@w3.org from October to December 2009)

From: Giovanni Campagna <scampa.giovanni@gmail.com>
Date: Sat, 31 Oct 2009 14:54:01 +0100
To: Doug Schepers <schepers@w3.org>
Cc: www-dom@w3.org
Message-ID: <65307430910310654q3e3326c4iee2f663bcb254471@mail.gmail.com>
2009/10/30 Doug Schepers <schepers@w3.org>:
> Hi, Laurens-
>
> Laurens Holst wrote (on 10/30/09 7:02 AM):
>>
>> Op 30-10-2009 10:32, Maciej Stachowiak schreef:
>>>
>>> "\uxxxx" is not a syntax, it is a Unicode string of the actual
>>> character. \u introduces the escape sequence for a unicode code point.
>>> So you can compare it directly to a character.
>>
>> Now I’m confused. The way Doug phrased it, \uxxxx *will* be syntax, i.e.
>> the string "U+xxxx" will be replaced by "\\uxxxx" (a 6-character string
>> containing an identifier). Not "\uxxxx" (a 1-character string containing
>> the actual character) which could be compared directly to a character.
>>
>> Otherwise, I would suggest not to talk in terms of "\uxxxx" strings at
>> all, after all the DOM specification does not need to concern itself
>> with serialisation, but instead to just talk about characters and code
>> points.
>
>
> Just to clarify, are you objecting to the loose way I phrased it in my
> email, or did you review the spec and find problems there?  I may have used
> the wrong terminology in the email, but the spec is the definitive source
> that needs to get it right.
>
> So, please clarify if you object to the change described in the spec.

Section 6.2.6, point 1.1.1: it says to use the "character value",
which is defined as the Unicode character escape, that is the string
consisting of U+005C REVERSE SOLIDUS, U+0075 LATIN SMALL LETTER U and
four characters in the range U+0061-0066 LATIN SMALL LETTER A to LATIN
SMALL LETTER F and U+0030-0039 DIGIT ZERO to DIGIT NINE, which
interpreted as hexadecimal can be transformed into an Unicode scalar
value / Unicode codepoint (meaning of the Unicode standard, not of the
DOM3Events spec).
Point 1.1.2 instead says to use the "Unicode code point" and encode it
with "\u" (again U+005C REVERSE SOLIDUS, U+0075 LATIN SMALL LETTER U)
followed by 4 or more hexadecimal digits.
Section 2.6.7 finally says to use the most-author friendly between the
"character value" and the "key name".

So:
1) What is the difference between 6.2.6, point 1.1.1 and the
following? Both result in a key value like "\u0041" for A U+0041 LATIN
CAPITAL LETTER A (since that is the "character value" for that key in
6.2.7, and that is the encoded Unicode code point as for the Unicode
standard)
2) How do you choose the most author friendly? I think that "A" is a
better name for U+0041 LATIN CAPITAL LETTER A, but spec says to prefer
always the character value, that is "\u0041"
3) Why both the examples and some earlier poster affirmed that instead
the value key is a single character string containing the character
itself?

That is, given event instance of KeyboardEvent, event.key in Javascript returns
for a key with label "+ =", in the US layout, shift pressed
- "\u002B", the Unicode code point / character value, represented as
an escaped string
this means that key.length===6 and key.charAt(0) === "\"
also, String(key).length===6 and JSON.parse(key) throws, that is,
there is no good method to get a "+" from that (unicode character
escapes are resolved at the parser layer, they don't leak inside
strings)
- "Plus", the key name from the database
this means that you need convertKeyValue to get the plus, but probably
it is more recognizable or easier to remember than "\u002B"
- "+", the Unicode character
this means that key.length===1 and key.charAt(0) ==="+"
this is probably the better way for an author, although is not what
the spec currently says
for a key with label "% 5", in the US-international layout, AltGr pressed
- "\u20AC" (key.length===6 and key.charAt(0)==="\")
- "Euro"
- "€"
and for an imaginary key corresponding to U+10000 LINEAR B SYLLABLE B008 A
- "\uD800\uDC00" (key.length===12 and key.charAt(0)==="\")
- "\u10000" (key.length===7 and key.charAt(0) ==="\")
- "LinearBSyllableB008A"
- "𐀀" (key.length===2 and key.charAt(0) ===String.fromCharCode(55296) )

Moreover, given the same events in C++, assuming that WebIDL for C++
said: "use char* for DOMString", what you get?
- "\u002B","\u20AC" and "\uD800\uDC00", meaning that strlen(key)==6
(==12 for U+10000) and *key == 92
- "\u002B", "\u20AC" and "\u10000", meaning that strlen(key) is either
6 or 7 and *key==92
- "Plus", "Euro", "LinearBSyllableB008A"
- "+" ( strlen(key)==1 and *key == 43 ) "€" ( strlen(key)==3 and *key
== 226 ) and "𐀀" ( strlen(key)==4 and *key==240 )

> Regards-
> -Doug Schepers
> W3C Team Contact, SVG and WebApps WGs
>
>

Giovanni
Received on Saturday, 31 October 2009 13:54:46 UTC