- From: Mark Davis ☕ <mark@macchiato.com>
- Date: Tue, 22 Sep 2009 22:27:05 -0700
- To: Doug Schepers <schepers@w3.org>
- Cc: "www-dom@w3.org" <www-dom@w3.org>
- Message-ID: <30b660a20909222227g666ccf46ke63dd6ea22434b2c@mail.gmail.com>
I don't know enough about the context here, but there appear to be a number of misperceptions, among them that U+0308 (or "\u0308" or whatever the syntax is) alone does not constitute a valid Unicode string. It is absolutely a valid string. Perhaps someone can point me to some background here. Mark On Tue, Sep 22, 2009 at 22:19, Doug Schepers <schepers@w3.org> wrote: > Hi, Maciej- > > Maciej Stachowiak wrote (on 9/22/09 11:23 PM): > >> >> On Sep 22, 2009, at 7:53 PM, Ian Hickson wrote: >> >> On Tue, 22 Sep 2009, Maciej Stachowiak wrote: >>> >>>> On Sep 22, 2009, at 9:27 AM, Anne van Kesteren wrote: >>>> >>>> I agree with Anne. I think we should remove the U+XXXX format entirely. >>>> If you have a string like Q, you can convert it to a unicode numeric >>>> value for range checking like this: [...] >>>> >>>> I don't think the U+XXXX string format does not add any value. >>>> >>> >>> Are dead keys represented in some way? The string "\x0308" is not a valid >>> Unicode string (it has a combining character with no base), but I don't >>> see how else we would represent the diaeresis dead key. >>> >> >> I hadn't thought of dead keys. >> > > I did mention that as one of the use cases at the beginning of this thread > [1], but I probably could have expressed it more clearly. > > Another case I mentioned is making sure that a character is in a certain > range (such as in a certain code block or language group). This is possible > with the Unicode code point (and some regex), but not with the character (I > think), because a given character representation can actually appear in > multiple ranges, so you can't say for certain that some particular character > belongs to unequivocally to a certain range. That might not be correct... > I'll look into it and report back (unless someone already knows for sure). > > > According to the spec, the key identifier >> for the diaeresis dead key is the string "DeadUmlaut". I can see a few >> possible ways to deal with this: >> > > We could remove the "key name" from the Unicode values, or replace it with > something more appropriate, perhaps. > > > 1) Have a way to get the unicode code point for a dead key. But I think >> a numeric value would be more useful than the U+XXXX format string. >> 1.a) This could be a global method that takes strings like "DeadUmlaut" >> and returns code points as numeric values ; OR >> 1.b) There could be an attribute on key events that gives the code >> point, if any, separate from the key identifier. long unicodeCodePoint >> for instance. >> > > When we discussed this in the telcons, we decided that a utility function > was better than a event attribute, because you could use it at any time, not > just when a keyboard event had occurred... (there was some other reason that > Travis brought up that escapes me at the moment). > > However, that was my first thought as well, so I'm amenable to that (maybe > just ".codepoint"?). > > > 2) Alternately - even though "\x0308" is not a valid Unicode string, it >> can still be represented as a DOM string and as a JavaScript string, >> since both the DOM and JavaScript define strings as sequences of 16-bit >> UTf-16 code units, and may represent invalid strings (including even >> such things as containing only one code unit of the two that comprise a >> surrogate pair). Thus, identifiers like "DeadUmlaut" could be replaced >> with ones like "\x0308". >> > > What's the advantage of this over the U+XXXX format string? I don't get > it. > > My own thought in putting this together was that we don't know all the uses > it will be put to, so enabling the most general and generic approach is > probably the safest bet. Cutting corners now might inadvertently exclude > some use case down the line, and enabling access to all the key identifier > value types doesn't seem to be much more overhead (if any). Please correct > me if I'm wrong. > > > [1] http://lists.w3.org/Archives/Public/www-dom/2009JulSep/0406.html > > Regards- > -Doug Schepers > W3C Team Contact, SVG and WebApps WGs > >
Received on Wednesday, 23 September 2009 05:27:47 UTC