Re: convertKeyIdentifier from Doug Schepers on 2009-09-23 (public-i18n-core@w3.org from July to September 2009)

From: Doug Schepers <schepers@w3.org>
Date: Wed, 23 Sep 2009 01:19:44 -0400
To: "www-dom@w3.org" <www-dom@w3.org>
Message-ID: <4AB9AFF0.5030305@w3.org>
Hi, Maciej-

Maciej Stachowiak wrote (on 9/22/09 11:23 PM):
>
> On Sep 22, 2009, at 7:53 PM, Ian Hickson wrote:
>
>> On Tue, 22 Sep 2009, Maciej Stachowiak wrote:
>>> On Sep 22, 2009, at 9:27 AM, Anne van Kesteren wrote:
>>>
>>> I agree with Anne. I think we should remove the U+XXXX format entirely.
>>> If you have a string like Q, you can convert it to a unicode numeric
>>> value for range checking like this: [...]
>>>
>>> I don't think the U+XXXX string format does not add any value.
>>
>> Are dead keys represented in some way? The string "\x0308" is not a valid
>> Unicode string (it has a combining character with no base), but I don't
>> see how else we would represent the diaeresis dead key.
>
> I hadn't thought of dead keys.

I did mention that as one of the use cases at the beginning of this 
thread [1], but I probably could have expressed it more clearly.

Another case I mentioned is making sure that a character is in a certain 
range (such as in a certain code block or language group).  This is 
possible with the Unicode code point (and some regex), but not with the 
character (I think), because a given character representation can 
actually appear in multiple ranges, so you can't say for certain that 
some particular character belongs to unequivocally to a certain range. 
That might not be correct... I'll look into it and report back (unless 
someone already knows for sure).


>According to the spec, the key identifier
> for the diaeresis dead key is the string "DeadUmlaut". I can see a few
> possible ways to deal with this:

We could remove the "key name" from the Unicode values, or replace it 
with something more appropriate, perhaps.


> 1) Have a way to get the unicode code point for a dead key. But I think
> a numeric value would be more useful than the U+XXXX format string.
> 1.a) This could be a global method that takes strings like "DeadUmlaut"
> and returns code points as numeric values ; OR
> 1.b) There could be an attribute on key events that gives the code
> point, if any, separate from the key identifier. long unicodeCodePoint
> for instance.

When we discussed this in the telcons, we decided that a utility 
function was better than a event attribute, because you could use it at 
any time, not just when a keyboard event had occurred... (there was some 
other reason that Travis brought up that escapes me at the moment).

However, that was my first thought as well, so I'm amenable to that 
(maybe just ".codepoint"?).


> 2) Alternately - even though "\x0308" is not a valid Unicode string, it
> can still be represented as a DOM string and as a JavaScript string,
> since both the DOM and JavaScript define strings as sequences of 16-bit
> UTf-16 code units, and may represent invalid strings (including even
> such things as containing only one code unit of the two that comprise a
> surrogate pair). Thus, identifiers like "DeadUmlaut" could be replaced
> with ones like "\x0308".

What's the advantage of this over the U+XXXX format string?  I don't get it.

My own thought in putting this together was that we don't know all the 
uses it will be put to, so enabling the most general and generic 
approach is probably the safest bet.  Cutting corners now might 
inadvertently exclude some use case down the line, and enabling access 
to all the key identifier value types doesn't seem to be much more 
overhead (if any).  Please correct me if I'm wrong.


[1] http://lists.w3.org/Archives/Public/www-dom/2009JulSep/0406.html

Regards-
-Doug Schepers
W3C Team Contact, SVG and WebApps WGs
Received on Wednesday, 23 September 2009 05:19:53 UTC