- From: Maciej Stachowiak <mjs@apple.com>
- Date: Tue, 22 Sep 2009 22:35:48 -0700
- To: Mark Davis ☕ <mark@macchiato.com>
- Cc: Doug Schepers <schepers@w3.org>, "www-dom@w3.org" <www-dom@w3.org>
- Message-id: <491DC175-2BC3-4D86-8B4B-831491284FD3@apple.com>
On Sep 22, 2009, at 10:27 PM, Mark Davis ☕ wrote: > I don't know enough about the context here, but there appear to be a > number of misperceptions, among them that U+0308 (or "\u0308" or > whatever the syntax is) alone does not constitute a valid Unicode > string. It is absolutely a valid string. If it is a valid string, then there is no problem. I just assumed Ian was right in saying that it wasn't, because I don't think it's important if it is technically invalid in some sense. Regards, Maciej > > Perhaps someone can point me to some background here. > > Mark > > > On Tue, Sep 22, 2009 at 22:19, Doug Schepers <schepers@w3.org> wrote: > Hi, Maciej- > > Maciej Stachowiak wrote (on 9/22/09 11:23 PM): > > On Sep 22, 2009, at 7:53 PM, Ian Hickson wrote: > > On Tue, 22 Sep 2009, Maciej Stachowiak wrote: > On Sep 22, 2009, at 9:27 AM, Anne van Kesteren wrote: > > I agree with Anne. I think we should remove the U+XXXX format > entirely. > If you have a string like Q, you can convert it to a unicode numeric > value for range checking like this: [...] > > I don't think the U+XXXX string format does not add any value. > > Are dead keys represented in some way? The string "\x0308" is not a > valid > Unicode string (it has a combining character with no base), but I > don't > see how else we would represent the diaeresis dead key. > > I hadn't thought of dead keys. > > I did mention that as one of the use cases at the beginning of this > thread [1], but I probably could have expressed it more clearly. > > Another case I mentioned is making sure that a character is in a > certain range (such as in a certain code block or language group). > This is possible with the Unicode code point (and some regex), but > not with the character (I think), because a given character > representation can actually appear in multiple ranges, so you can't > say for certain that some particular character belongs to > unequivocally to a certain range. That might not be correct... I'll > look into it and report back (unless someone already knows for sure). > > > According to the spec, the key identifier > for the diaeresis dead key is the string "DeadUmlaut". I can see a few > possible ways to deal with this: > > We could remove the "key name" from the Unicode values, or replace > it with something more appropriate, perhaps. > > > 1) Have a way to get the unicode code point for a dead key. But I > think > a numeric value would be more useful than the U+XXXX format string. > 1.a) This could be a global method that takes strings like > "DeadUmlaut" > and returns code points as numeric values ; OR > 1.b) There could be an attribute on key events that gives the code > point, if any, separate from the key identifier. long unicodeCodePoint > for instance. > > When we discussed this in the telcons, we decided that a utility > function was better than a event attribute, because you could use it > at any time, not just when a keyboard event had occurred... (there > was some other reason that Travis brought up that escapes me at the > moment). > > However, that was my first thought as well, so I'm amenable to that > (maybe just ".codepoint"?). > > > 2) Alternately - even though "\x0308" is not a valid Unicode string, > it > can still be represented as a DOM string and as a JavaScript string, > since both the DOM and JavaScript define strings as sequences of 16- > bit > UTf-16 code units, and may represent invalid strings (including even > such things as containing only one code unit of the two that > comprise a > surrogate pair). Thus, identifiers like "DeadUmlaut" could be replaced > with ones like "\x0308". > > What's the advantage of this over the U+XXXX format string? I don't > get it. > > My own thought in putting this together was that we don't know all > the uses it will be put to, so enabling the most general and generic > approach is probably the safest bet. Cutting corners now might > inadvertently exclude some use case down the line, and enabling > access to all the key identifier value types doesn't seem to be much > more overhead (if any). Please correct me if I'm wrong. > > > [1] http://lists.w3.org/Archives/Public/www-dom/2009JulSep/0406.html > > Regards- > -Doug Schepers > W3C Team Contact, SVG and WebApps WGs > >
Received on Wednesday, 23 September 2009 05:36:31 UTC