Re: convertKeyIdentifier from Maciej Stachowiak on 2009-09-23 (www-dom@w3.org from July to September 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 22 Sep 2009 22:34:41 -0700
To: Doug Schepers <schepers@w3.org>
Cc: "www-dom@w3.org" <www-dom@w3.org>
Message-id: <800DE4E1-CDA9-4FBF-9DF0-94EFB5A5938C@apple.com>
On Sep 22, 2009, at 10:19 PM, Doug Schepers wrote:

> Hi, Maciej-
>
> Maciej Stachowiak wrote (on 9/22/09 11:23 PM):
>>
>> On Sep 22, 2009, at 7:53 PM, Ian Hickson wrote:
>>
>>> On Tue, 22 Sep 2009, Maciej Stachowiak wrote:
>>>> On Sep 22, 2009, at 9:27 AM, Anne van Kesteren wrote:
>>>>
>>>> I agree with Anne. I think we should remove the U+XXXX format  
>>>> entirely.
>>>> If you have a string like Q, you can convert it to a unicode  
>>>> numeric
>>>> value for range checking like this: [...]
>>>>
>>>> I don't think the U+XXXX string format does not add any value.
>>>
>>> Are dead keys represented in some way? The string "\x0308" is not  
>>> a valid
>>> Unicode string (it has a combining character with no base), but I  
>>> don't
>>> see how else we would represent the diaeresis dead key.
>>
>> I hadn't thought of dead keys.
>
> I did mention that as one of the use cases at the beginning of this  
> thread [1], but I probably could have expressed it more clearly.
>
> Another case I mentioned is making sure that a character is in a  
> certain range (such as in a certain code block or language group).   
> This is possible with the Unicode code point (and some regex), but  
> not with the character (I think), because a given character  
> representation can actually appear in multiple ranges, so you can't  
> say for certain that some particular character belongs to  
> unequivocally to a certain range. That might not be correct... I'll  
> look into it and report back (unless someone already knows for sure).

That's not correct. See the email that Ian was replying to. It's easy  
to get the unicode code point given a string containing the character,  
in fact, it's easier to do that than to use the convertKeyIdentifier()  
API. Furthermore, in JavaScript, you can do range comparisons using  
strings directly. Every character has exactly one unicode code point,  
and you can unequivocally say if it is in a particular range or not.

>
>
>> According to the spec, the key identifier
>> for the diaeresis dead key is the string "DeadUmlaut". I can see a  
>> few
>> possible ways to deal with this:
>
> We could remove the "key name" from the Unicode values, or replace  
> it with something more appropriate, perhaps.
>
>
>> 1) Have a way to get the unicode code point for a dead key. But I  
>> think
>> a numeric value would be more useful than the U+XXXX format string.
>> 1.a) This could be a global method that takes strings like  
>> "DeadUmlaut"
>> and returns code points as numeric values ; OR
>> 1.b) There could be an attribute on key events that gives the code
>> point, if any, separate from the key identifier. long  
>> unicodeCodePoint
>> for instance.
>
> When we discussed this in the telcons, we decided that a utility  
> function was better than a event attribute, because you could use it  
> at any time, not just when a keyboard event had occurred... (there  
> was some other reason that Travis brought up that escapes me at the  
> moment).
>
> However, that was my first thought as well, so I'm amenable to that  
> (maybe just ".codepoint"?).

It should be .codePoint since "code point" is two words. Which is  
better depends on whether there is any use case for getting the code  
point for a non-character key identifier at any time other than during  
event dispatch.

>
>
>> 2) Alternately - even though "\x0308" is not a valid Unicode  
>> string, it
>> can still be represented as a DOM string and as a JavaScript string,
>> since both the DOM and JavaScript define strings as sequences of 16- 
>> bit
>> UTf-16 code units, and may represent invalid strings (including even
>> such things as containing only one code unit of the two that  
>> comprise a
>> surrogate pair). Thus, identifiers like "DeadUmlaut" could be  
>> replaced
>> with ones like "\x0308".
>
> What's the advantage of this over the U+XXXX format string?  I don't  
> get it.

You can get the unicode value more easily from "\xu0308" than from "U 
+0308". Note that this is an escape sequence that makes a string which  
actually contains the unicode character represented by code point hex  
308. You can get the numeric code point from such a string directly  
using the charCodeAt() method that's built in to JavaScript.

>
> My own thought in putting this together was that we don't know all  
> the uses it will be put to, so enabling the most general and generic  
> approach is probably the safest bet.  Cutting corners now might  
> inadvertently exclude some use case down the line, and enabling  
> access to all the key identifier value types doesn't seem to be much  
> more overhead (if any).  Please correct me if I'm wrong.

I think it makes things simpler for every key to have exactly one  
identifier. If there's a need to get a code point from an identifier,   
we can just provide that as a numeric value. Providing it as a U+XXXX  
string, and considering that string to be somehow an alternate  
identifier, does not add any value.

Regards,
Maciej
Received on Wednesday, 23 September 2009 05:35:24 UTC