Re: convertKeyIdentifier

Hi, Mark-

Here's the facts as I see them... others should feel free to jump in 
with corrections or clarifications, or even other use cases or requirements.

The context is the DOM3 Events spec [1].  We are discussing what the 
values for some of the key values should be... we have constructed a 
level of abstraction above scan codes, keyCode/charCode, key mappings, 
dead keys, and all that, into a set of "key identifiers".  (Note that a 
"key identifier" is actually the current modified and mapped value for 
the key, nothing like a keyCode or scan code).  We are discussing what 
the nature of the value should be, in the context of an API for key events.

For historical context, key identifiers were specified originally as a 
set of key names (for control keys, etc., like "Enter", "Tab", or "F1") 
and a set of Unicode strings ("U+xxxx").  This is how it last stood when 
work was stopped on the spec in 2003, and published as a W3C Note. (It 
was a bit muddled in some details, but a good start.)


We have recently taken up the spec again and are revisiting the key 
identifier issue, and in looking at the model, are discussing what the 
best return values are for a particular keystroke.  In the current 
model, each key identifier has one or more of the following "value 
types": a key name (like "Shift"), a character value (like "w"), or a 
Unicode string (like "U+0308").

With the key names (those that don't represent a character or dead key 
that has a Unicode code point), we seem to be resolved on the full 
multi-character string.

However, with the Unicode value, there is some contention about which is 
the best way to do this.  We know we want to make it very easy for 
script authors to get the character representation of the code point, 
but we also want them to be able to get the code point, for things like 
"\u0308" (diaeresis dead key).  This is obviously important for 
internationalization.

We also know that we want people to easily be able to represent the key 
identifiers in markup (such as the value for @accesskey, perhaps), so it 
should be representable there as well.  (My assumption there is that 
they should be able to use any of the character value, the code point 
string, or the key name, as appropriate and convenient.)

We also want to allow people to test the Unicode range of the key 
identifier, to see if it falls into a certain orthography, general 
category, etc..

Finally, we want this to be as simple as possible to understand, 
implement, and use, while not sacrificing any potential functionality.

As a secondary goal, we may want to allow people to convert the 
resulting key identifier (or, indeed, any string) into different 
characterizations (entity, code point, character, key name), not as part 
of an event, but in the general case.

A few mechanisms have been proposed.  One centers around a single 
attribute value on the keyboard event (".key"), which may resolve to the 
most author-friendly value, which can then perhaps be manipulated (there 
are various permutations of this idea).  The other involves an 
additional attribute (".codePoint"), that gives the Unicode code point, 
if one exists.

Any insight the i18n folks could shine here would be appreciated.


[1] http://dev.w3.org/2006/webapi/DOM-Level-3-Events/html/DOM3-Events.html

Regards-
-Doug Schepers
W3C Team Contact, SVG and WebApps WGs



Mark Davis ☕ wrote (on 9/23/09 1:27 AM):
> I don't know enough about the context here, but there appear to be a
> number of misperceptions, among them that U+0308 (or "\u0308" or
> whatever the syntax is) alone does not constitute a valid Unicode
> string. It is absolutely a valid string.
>
> Perhaps someone can point me to some background here.
>
> Mark
>
>
> On Tue, Sep 22, 2009 at 22:19, Doug Schepers <schepers@w3.org
> <mailto:schepers@w3.org>> wrote:
>
>     Hi, Maciej-
>
>     Maciej Stachowiak wrote (on 9/22/09 11:23 PM):
>
>
>         On Sep 22, 2009, at 7:53 PM, Ian Hickson wrote:
>
>             On Tue, 22 Sep 2009, Maciej Stachowiak wrote:
>
>                 On Sep 22, 2009, at 9:27 AM, Anne van Kesteren wrote:
>
>                 I agree with Anne. I think we should remove the U+XXXX
>                 format entirely.
>                 If you have a string like Q, you can convert it to a
>                 unicode numeric
>                 value for range checking like this: [...]
>
>                 I don't think the U+XXXX string format does not add any
>                 value.
>
>
>             Are dead keys represented in some way? The string "\x0308"
>             is not a valid
>             Unicode string (it has a combining character with no base),
>             but I don't
>             see how else we would represent the diaeresis dead key.
>
>
>         I hadn't thought of dead keys.
>
>
>     I did mention that as one of the use cases at the beginning of this
>     thread [1], but I probably could have expressed it more clearly.
>
>     Another case I mentioned is making sure that a character is in a
>     certain range (such as in a certain code block or language group).
>       This is possible with the Unicode code point (and some regex), but
>     not with the character (I think), because a given character
>     representation can actually appear in multiple ranges, so you can't
>     say for certain that some particular character belongs to
>     unequivocally to a certain range. That might not be correct... I'll
>     look into it and report back (unless someone already knows for sure).
>
>
>         According to the spec, the key identifier
>         for the diaeresis dead key is the string "DeadUmlaut". I can see
>         a few
>         possible ways to deal with this:
>
>
>     We could remove the "key name" from the Unicode values, or replace
>     it with something more appropriate, perhaps.
>
>
>         1) Have a way to get the unicode code point for a dead key. But
>         I think
>         a numeric value would be more useful than the U+XXXX format string.
>         1.a) This could be a global method that takes strings like
>         "DeadUmlaut"
>         and returns code points as numeric values ; OR
>         1.b) There could be an attribute on key events that gives the code
>         point, if any, separate from the key identifier. long
>         unicodeCodePoint
>         for instance.
>
>
>     When we discussed this in the telcons, we decided that a utility
>     function was better than a event attribute, because you could use it
>     at any time, not just when a keyboard event had occurred... (there
>     was some other reason that Travis brought up that escapes me at the
>     moment).
>
>     However, that was my first thought as well, so I'm amenable to that
>     (maybe just ".codepoint"?).
>
>
>         2) Alternately - even though "\x0308" is not a valid Unicode
>         string, it
>         can still be represented as a DOM string and as a JavaScript string,
>         since both the DOM and JavaScript define strings as sequences of
>         16-bit
>         UTf-16 code units, and may represent invalid strings (including even
>         such things as containing only one code unit of the two that
>         comprise a
>         surrogate pair). Thus, identifiers like "DeadUmlaut" could be
>         replaced
>         with ones like "\x0308".
>
>
>     What's the advantage of this over the U+XXXX format string?  I don't
>     get it.
>
>     My own thought in putting this together was that we don't know all
>     the uses it will be put to, so enabling the most general and generic
>     approach is probably the safest bet.  Cutting corners now might
>     inadvertently exclude some use case down the line, and enabling
>     access to all the key identifier value types doesn't seem to be much
>     more overhead (if any).  Please correct me if I'm wrong.
>
>
>     [1] http://lists.w3.org/Archives/Public/www-dom/2009JulSep/0406.html
>
>     Regards-
>     -Doug Schepers
>     W3C Team Contact, SVG and WebApps WGs

Received on Wednesday, 23 September 2009 06:27:13 UTC