- From: Doug Schepers <schepers@w3.org>
- Date: Wed, 23 Sep 2009 01:19:44 -0400
- To: "www-dom@w3.org" <www-dom@w3.org>
Hi, Maciej- Maciej Stachowiak wrote (on 9/22/09 11:23 PM): > > On Sep 22, 2009, at 7:53 PM, Ian Hickson wrote: > >> On Tue, 22 Sep 2009, Maciej Stachowiak wrote: >>> On Sep 22, 2009, at 9:27 AM, Anne van Kesteren wrote: >>> >>> I agree with Anne. I think we should remove the U+XXXX format entirely. >>> If you have a string like Q, you can convert it to a unicode numeric >>> value for range checking like this: [...] >>> >>> I don't think the U+XXXX string format does not add any value. >> >> Are dead keys represented in some way? The string "\x0308" is not a valid >> Unicode string (it has a combining character with no base), but I don't >> see how else we would represent the diaeresis dead key. > > I hadn't thought of dead keys. I did mention that as one of the use cases at the beginning of this thread [1], but I probably could have expressed it more clearly. Another case I mentioned is making sure that a character is in a certain range (such as in a certain code block or language group). This is possible with the Unicode code point (and some regex), but not with the character (I think), because a given character representation can actually appear in multiple ranges, so you can't say for certain that some particular character belongs to unequivocally to a certain range. That might not be correct... I'll look into it and report back (unless someone already knows for sure). >According to the spec, the key identifier > for the diaeresis dead key is the string "DeadUmlaut". I can see a few > possible ways to deal with this: We could remove the "key name" from the Unicode values, or replace it with something more appropriate, perhaps. > 1) Have a way to get the unicode code point for a dead key. But I think > a numeric value would be more useful than the U+XXXX format string. > 1.a) This could be a global method that takes strings like "DeadUmlaut" > and returns code points as numeric values ; OR > 1.b) There could be an attribute on key events that gives the code > point, if any, separate from the key identifier. long unicodeCodePoint > for instance. When we discussed this in the telcons, we decided that a utility function was better than a event attribute, because you could use it at any time, not just when a keyboard event had occurred... (there was some other reason that Travis brought up that escapes me at the moment). However, that was my first thought as well, so I'm amenable to that (maybe just ".codepoint"?). > 2) Alternately - even though "\x0308" is not a valid Unicode string, it > can still be represented as a DOM string and as a JavaScript string, > since both the DOM and JavaScript define strings as sequences of 16-bit > UTf-16 code units, and may represent invalid strings (including even > such things as containing only one code unit of the two that comprise a > surrogate pair). Thus, identifiers like "DeadUmlaut" could be replaced > with ones like "\x0308". What's the advantage of this over the U+XXXX format string? I don't get it. My own thought in putting this together was that we don't know all the uses it will be put to, so enabling the most general and generic approach is probably the safest bet. Cutting corners now might inadvertently exclude some use case down the line, and enabling access to all the key identifier value types doesn't seem to be much more overhead (if any). Please correct me if I'm wrong. [1] http://lists.w3.org/Archives/Public/www-dom/2009JulSep/0406.html Regards- -Doug Schepers W3C Team Contact, SVG and WebApps WGs
Received on Wednesday, 23 September 2009 05:19:53 UTC