- From: Doug Schepers <schepers@w3.org>
- Date: Fri, 30 Oct 2009 14:38:35 -0400
- To: Mark Davis ☕ <mark@macchiato.com>
- CC: www-dom@w3.org, www-international@w3.org
Hi, Mark- Mark Davis ☕ wrote (on 10/30/09 12:22 PM): > I want to point out that Unicode code points can go up to hex 10FFFF. > The standard for \u is exactly 4 digits, so that one can intermix with > characters and know where it terminates. There are a couple of schemes > that are used to extend this to up to 6 digits, and still know where to > terminate. > > \UXXXXXXXX - C++, ICU > \UXXXXXX - C# > \u{xxxxxx} - Ruby > > There needs to be some mechanism for extending to 6 digits. It would be > best to use one of the above rather than a new one. (My personal > favorite is Ruby's.) The reason the "\u" escaped character sequence was chosen was that it is the native ECMAScript escape notation, which is easy for browser-based applications to use directly (i.e. they can inject it directly into the markup as a character). But, yes, this does have the cap of 4 digits, and I personally would prefer to use a different escape mechanism... but only if one or both of these 2 conditions obtains: 1) DOM3 Events implementations also update their Javascript engines to be able to process the additional escape sequence (e.g. one of the ones you mention above) in the same way they process the "\u" escape sequence. This is the better long-term solution, and I'd hope ECMA TC39 could be persuaded to add this to future ECMAScript specs. 2) Script authors could use a normalizing method (c.f. convertKeyValue) to "dumb down" the 6-digit escape sequence into the 4-digit format (by converting to surrogate pairs when necessary). Javascript is becoming increasingly important, and so is the need for internationalized and localized language support. With the new font-linking enablers (including my favorite, WOFF [1]), and i18n domain extension policy [2], we're going to see more use of languages I have no chance of ever understanding, and I want DOM3 Events and ECMAScript to be part of that. I'd rather not introduce a not-very-good solution (UTF-16) that we know would not meet all the needs of the world community, just because of a (temporary?) circumstance with a vagary of Javascript. But, I also want this spec interoperably implemented... so, any solution needs the buy-in of the implementers. Any arguments on either side of the coin would help make a more informed decision. BTW, you stated a preference for the Ruby-style delimited escaped characters... could you say why you prefer that? [1] http://people.mozilla.com/~jkew/woff/woff-2009-09-16.html [2] http://www.icann.org/en/announcements/announcement-30oct09-en.htm Regards- -Doug Schepers W3C Team Contact, SVG and WebApps WGs
Received on Friday, 30 October 2009 18:38:48 UTC