Re: New full Unicode for ES6 idea from Allen Wirfs-Brock on 2012-03-01 (public-script-coord@w3.org from January to March 2012)

From: Allen Wirfs-Brock <allen@wirfs-brock.com>
Date: Wed, 29 Feb 2012 19:54:31 -0800
To: Brendan Eich <brendan@mozilla.com>
Cc: Wes Garland <wes@page.ca>, Norbert Lindenberg <ecmascript@norbertlindenberg.com>, "public-script-coord@w3.org" <public-script-coord@w3.org>, mranney@voxer.com, es-discuss <es-discuss@mozilla.org>
Message-Id: <2D9BF615-5DAE-4750-9092-14D258E2517B@wirfs-brock.com>

I posted a new stawman that describes what I think should is that most minimal support that we must provide for "full unicode" in ES.next: http://wiki.ecmascript.org/doku.php?id=strawman:full_unicode_source_code 

I'm not suggesting that we must stop at this level of support, but I think not doing at least what is describe in this proposal would would be mistake.

Thoughts?


Allen



On Feb 28, 2012, at 3:49 AM, Brendan Eich wrote:

> Wes Garland wrote:
>> If four-byte escapes are statically rejected in BRS-on, we have a problem -- we should be able to use old code that runs in either mode unchanged when said code only uses characters in the BMP.
> 
> We've been over this and I conceded to Allen that "four-byte escapes" (I'll use \uXXXX to be clear from now on) must work as today with BRS-on. Otherwise we make it hard to impossible to migrate code that knows what it is doing with 16-bit code units that round-trip properly.
> 
>> Accepting both 4 and 6 byte escapes is a problem, though -- what is "\u123456".length?  1 or 3?
> 
> This is not a problem. We want .length to distribute across concatenation, so 3 is the only answer and in particular ("\u1234" + "\u5678").length === 2 irrespective of BRS.
> 
>> If we accept "\u1234" in BRS-on as a string with length 5 -- as we do today in ES5 with "\u123".length===4 -- we give developers a way to feature-test and conditionally execute code, allowing libraries to run with BRS-on and BRS-off.
> 
> Feature-testing should be done using a more explicit test. API TBD, but I don't think breaking "\uXXXX" with BRS on is a good idea.
> 
> I agree with you that Roozbeh is hardly used, so it can take the hit of having to feature-test the BRS. The much more common case today is JS code that blithely ignores non-BMP characters that make it into strings as pairs, treating them blindly as two "characters" (ugh; must purge that "c-word" abusage from the spec).
> 
> /be
>

Received on Thursday, 1 March 2012 03:55:07 UTC