Re: convertKeyIdentifier from Maciej Stachowiak on 2009-09-23 (www-dom@w3.org from July to September 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 22 Sep 2009 22:35:48 -0700
To: Mark Davis ☕ <mark@macchiato.com>
Cc: Doug Schepers <schepers@w3.org>, "www-dom@w3.org" <www-dom@w3.org>
Message-id: <491DC175-2BC3-4D86-8B4B-831491284FD3@apple.com>
On Sep 22, 2009, at 10:27 PM, Mark Davis ☕ wrote:

> I don't know enough about the context here, but there appear to be a  
> number of misperceptions, among them that U+0308 (or "\u0308" or  
> whatever the syntax is) alone does not constitute a valid Unicode  
> string. It is absolutely a valid string.

If it is a valid string, then there is no problem. I just assumed Ian  
was right in saying that it wasn't, because I don't think it's  
important if it is technically invalid in some sense.

Regards,
Maciej

>
> Perhaps someone can point me to some background here.
>
> Mark
>
>
> On Tue, Sep 22, 2009 at 22:19, Doug Schepers <schepers@w3.org> wrote:
> Hi, Maciej-
>
> Maciej Stachowiak wrote (on 9/22/09 11:23 PM):
>
> On Sep 22, 2009, at 7:53 PM, Ian Hickson wrote:
>
> On Tue, 22 Sep 2009, Maciej Stachowiak wrote:
> On Sep 22, 2009, at 9:27 AM, Anne van Kesteren wrote:
>
> I agree with Anne. I think we should remove the U+XXXX format  
> entirely.
> If you have a string like Q, you can convert it to a unicode numeric
> value for range checking like this: [...]
>
> I don't think the U+XXXX string format does not add any value.
>
> Are dead keys represented in some way? The string "\x0308" is not a  
> valid
> Unicode string (it has a combining character with no base), but I  
> don't
> see how else we would represent the diaeresis dead key.
>
> I hadn't thought of dead keys.
>
> I did mention that as one of the use cases at the beginning of this  
> thread [1], but I probably could have expressed it more clearly.
>
> Another case I mentioned is making sure that a character is in a  
> certain range (such as in a certain code block or language group).   
> This is possible with the Unicode code point (and some regex), but  
> not with the character (I think), because a given character  
> representation can actually appear in multiple ranges, so you can't  
> say for certain that some particular character belongs to  
> unequivocally to a certain range. That might not be correct... I'll  
> look into it and report back (unless someone already knows for sure).
>
>
> According to the spec, the key identifier
> for the diaeresis dead key is the string "DeadUmlaut". I can see a few
> possible ways to deal with this:
>
> We could remove the "key name" from the Unicode values, or replace  
> it with something more appropriate, perhaps.
>
>
> 1) Have a way to get the unicode code point for a dead key. But I  
> think
> a numeric value would be more useful than the U+XXXX format string.
> 1.a) This could be a global method that takes strings like  
> "DeadUmlaut"
> and returns code points as numeric values ; OR
> 1.b) There could be an attribute on key events that gives the code
> point, if any, separate from the key identifier. long unicodeCodePoint
> for instance.
>
> When we discussed this in the telcons, we decided that a utility  
> function was better than a event attribute, because you could use it  
> at any time, not just when a keyboard event had occurred... (there  
> was some other reason that Travis brought up that escapes me at the  
> moment).
>
> However, that was my first thought as well, so I'm amenable to that  
> (maybe just ".codepoint"?).
>
>
> 2) Alternately - even though "\x0308" is not a valid Unicode string,  
> it
> can still be represented as a DOM string and as a JavaScript string,
> since both the DOM and JavaScript define strings as sequences of 16- 
> bit
> UTf-16 code units, and may represent invalid strings (including even
> such things as containing only one code unit of the two that  
> comprise a
> surrogate pair). Thus, identifiers like "DeadUmlaut" could be replaced
> with ones like "\x0308".
>
> What's the advantage of this over the U+XXXX format string?  I don't  
> get it.
>
> My own thought in putting this together was that we don't know all  
> the uses it will be put to, so enabling the most general and generic  
> approach is probably the safest bet.  Cutting corners now might  
> inadvertently exclude some use case down the line, and enabling  
> access to all the key identifier value types doesn't seem to be much  
> more overhead (if any).  Please correct me if I'm wrong.
>
>
> [1] http://lists.w3.org/Archives/Public/www-dom/2009JulSep/0406.html
>
> Regards-
> -Doug Schepers
> W3C Team Contact, SVG and WebApps WGs
>
>
Received on Wednesday, 23 September 2009 05:36:31 UTC