Re: Changes to DOM3 Events Key Identifiers from Mark Davis ☕ on 2009-10-30 (www-dom@w3.org from October to December 2009)

From: Mark Davis ☕ <mark@macchiato.com>
Date: Fri, 30 Oct 2009 11:41:24 -0700
To: Doug Schepers <schepers@w3.org>
Cc: www-dom@w3.org, www-international@w3.org
Message-ID: <30b660a20910301141y1d9defd0x920e78d101940885@mail.gmail.com>

If the target of this is JavaScript, then the alternative (which Java has
also chosen) is to use the UTF16 representation, wherein a pair of \u
characters represents each supplementary character (above FFFF). It just
needs to be carefully documented.

Mark


On Fri, Oct 30, 2009 at 11:38, Doug Schepers <schepers@w3.org> wrote:

> Hi, Mark-
>
> Mark Davis ☕ wrote (on 10/30/09 12:22 PM):
>
>  I want to point out that Unicode code points can go up to hex 10FFFF.
>> The standard for \u is exactly 4 digits, so that one can intermix with
>> characters and know where it terminates. There are a couple of schemes
>> that are used to extend this to up to 6 digits, and still know where to
>> terminate.
>>
>> \UXXXXXXXX - C++, ICU
>> \UXXXXXX - C#
>> \u{xxxxxx} - Ruby
>>
>> There needs to be some mechanism for extending to 6 digits. It would be
>> best to use one of the above rather than a new one. (My personal
>> favorite is Ruby's.)
>>
>
> The reason the "\u" escaped character sequence was chosen was that it is
> the native ECMAScript escape notation, which is easy for browser-based
> applications to use directly (i.e. they can inject it directly into the
> markup as a character).
>
> But, yes, this does have the cap of 4 digits, and I personally would prefer
> to use a different escape mechanism... but only if one or both of these 2
> conditions obtains:
>
> 1) DOM3 Events implementations also update their Javascript engines to be
> able to process the additional escape sequence (e.g. one of the ones you
> mention above) in the same way they process the "\u" escape sequence.  This
> is the better long-term solution, and I'd hope ECMA TC39 could be persuaded
> to add this to future ECMAScript specs.
>
> 2) Script authors could use a normalizing method (c.f. convertKeyValue) to
> "dumb down" the 6-digit escape sequence into the 4-digit format (by
> converting to surrogate pairs when necessary).
>
> Javascript is becoming increasingly important, and so is the need for
> internationalized and localized language support.  With the new font-linking
> enablers (including my favorite, WOFF [1]), and i18n domain extension policy
> [2], we're going to see more use of languages I have no chance of ever
> understanding, and I want DOM3 Events and ECMAScript to be part of that.
>  I'd rather not introduce a not-very-good solution (UTF-16) that we know
> would not meet all the needs of the world community, just because of a
> (temporary?) circumstance with a vagary of Javascript.
>
> But, I also want this spec interoperably implemented... so, any solution
> needs the buy-in of the implementers.  Any arguments on either side of the
> coin would help make a more informed decision.
>
> BTW, you stated a preference for the Ruby-style delimited escaped
> characters... could you say why you prefer that?
>
> [1] http://people.mozilla.com/~jkew/woff/woff-2009-09-16.html<http://people.mozilla.com/%7Ejkew/woff/woff-2009-09-16.html>
> [2] http://www.icann.org/en/announcements/announcement-30oct09-en.htm
>
> Regards-
> -Doug Schepers
>
> W3C Team Contact, SVG and WebApps WGs
>

Received on Friday, 30 October 2009 18:41:58 UTC