- From: Mark Davis ☕ <mark@macchiato.com>
- Date: Fri, 30 Oct 2009 11:41:24 -0700
- To: Doug Schepers <schepers@w3.org>
- Cc: www-dom@w3.org, www-international@w3.org
- Message-ID: <30b660a20910301141y1d9defd0x920e78d101940885@mail.gmail.com>
If the target of this is JavaScript, then the alternative (which Java has also chosen) is to use the UTF16 representation, wherein a pair of \u characters represents each supplementary character (above FFFF). It just needs to be carefully documented. Mark On Fri, Oct 30, 2009 at 11:38, Doug Schepers <schepers@w3.org> wrote: > Hi, Mark- > > Mark Davis ☕ wrote (on 10/30/09 12:22 PM): > > I want to point out that Unicode code points can go up to hex 10FFFF. >> The standard for \u is exactly 4 digits, so that one can intermix with >> characters and know where it terminates. There are a couple of schemes >> that are used to extend this to up to 6 digits, and still know where to >> terminate. >> >> \UXXXXXXXX - C++, ICU >> \UXXXXXX - C# >> \u{xxxxxx} - Ruby >> >> There needs to be some mechanism for extending to 6 digits. It would be >> best to use one of the above rather than a new one. (My personal >> favorite is Ruby's.) >> > > The reason the "\u" escaped character sequence was chosen was that it is > the native ECMAScript escape notation, which is easy for browser-based > applications to use directly (i.e. they can inject it directly into the > markup as a character). > > But, yes, this does have the cap of 4 digits, and I personally would prefer > to use a different escape mechanism... but only if one or both of these 2 > conditions obtains: > > 1) DOM3 Events implementations also update their Javascript engines to be > able to process the additional escape sequence (e.g. one of the ones you > mention above) in the same way they process the "\u" escape sequence. This > is the better long-term solution, and I'd hope ECMA TC39 could be persuaded > to add this to future ECMAScript specs. > > 2) Script authors could use a normalizing method (c.f. convertKeyValue) to > "dumb down" the 6-digit escape sequence into the 4-digit format (by > converting to surrogate pairs when necessary). > > Javascript is becoming increasingly important, and so is the need for > internationalized and localized language support. With the new font-linking > enablers (including my favorite, WOFF [1]), and i18n domain extension policy > [2], we're going to see more use of languages I have no chance of ever > understanding, and I want DOM3 Events and ECMAScript to be part of that. > I'd rather not introduce a not-very-good solution (UTF-16) that we know > would not meet all the needs of the world community, just because of a > (temporary?) circumstance with a vagary of Javascript. > > But, I also want this spec interoperably implemented... so, any solution > needs the buy-in of the implementers. Any arguments on either side of the > coin would help make a more informed decision. > > BTW, you stated a preference for the Ruby-style delimited escaped > characters... could you say why you prefer that? > > [1] http://people.mozilla.com/~jkew/woff/woff-2009-09-16.html<http://people.mozilla.com/%7Ejkew/woff/woff-2009-09-16.html> > [2] http://www.icann.org/en/announcements/announcement-30oct09-en.htm > > Regards- > -Doug Schepers > > W3C Team Contact, SVG and WebApps WGs >
Received on Friday, 30 October 2009 18:41:58 UTC