Updated proposal for Unicode Supplementary Characters in ECMAScript

I have updated my proposal for Supplementary Characters in ECMAScript [1] based on the feedback on the es-discuss@ mailing list [2] and the TC 39 meeting in March [3]. This updated version generally reflects the consensus reached at the meeting, but provides more detail. Changes are listed in the Updates section.

The proposal keeps UTF-16 as the encoding for source text and String values in ECMAScript, but updates the specification of functionality that interprets them to do so based on an interpretation as code points, thus enabling support for the full Unicode character set. In the case of regular expressions, this requires the introduction of a new Unicode mode.

There's one change from what was previously discussed: We had discussed using full Unicode case folding in the Unicode mode of regular expressions, including mappings that map a single code point to a sequence of code points, such as "ß" -> "ss". In trying to integrate this into the spec, I found that Unicode Technical Standard 18, Unicode Regular Expressions [4], doesn't completely specify the interpretation of such mappings, for example in character classes. I therefore reverted to simple Unicode case folding for now, and provided feedback to the Unicode Consortium requesting clarification.

To those on public-script-coord@ but not on es-discuss@, my apologies for not keeping you in the loop after the first round of discussions in February/March. I'll try to pay more attention to this list in the future.

Best regards,
Norbert

[1] http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html
[2] https://mail.mozilla.org/pipermail/es-discuss/2012-March/thread.html#21620
[3] https://mail.mozilla.org/pipermail/es-discuss/2012-March/thread.html#21919
[4] http://unicode.org/reports/tr18/#Default_Loose_Matches

Received on Wednesday, 9 May 2012 07:23:52 UTC