Re: New full Unicode for ES6 idea from Brendan Eich on 2012-02-20 (public-script-coord@w3.org from January to March 2012)

From: Brendan Eich <brendan@mozilla.com>
Date: Mon, 20 Feb 2012 08:20:07 -0800
To: Allen Wirfs-Brock <allen@wirfs-brock.com>
CC: Gavin Barraclough <barraclough@apple.com>, public-script-coord@w3.org, Anne van Kesteren <annevk@opera.com>, mranney@voxer.com, es-discuss discussion <es-discuss@mozilla.org>
Message-ID: <4F4272B7.4000209@mozilla.com>

Allen Wirfs-Brock wrote:
>> Last year we dispensed with the binary data hacking in strings use-case. I don't see the hardship. But rather than throw exceptions on concatenation I would simply eliminate the ability to spell code units with "\uXXXX" escapes. Who's with me?
>
> I think we need to be careful not to equate the syntax of ES string literals with the actual encoding space of string elements.

I agree, which is why I'm saying with the BRS set, we should forbid 
"\uXXXX" since that is not a code point rather a code unit.

>    Whether you say "\ud800" or "\u{00d800}", or call a function that does full-unicode to UTF-16 encoding, or simply create a string from file contents you may end up with string elements containing upper or lower half surrogates.

I don't agree in the case of "\u{00d800}". That's simply an illegal code 
point, not a code unit (upper or lower half). We can reject it statically.

>      Eliminating the "\uXXXX" syntax really doesn't change anything regarding actual string processing.

True, but not my point!

> What it might do, however, is eliminate the ambiguity about the intended meaning of  "\uD800\uDc00" in legacy code.

And arising from concatenations, avoiding the loss of Gavin's 
distributive .length property.

> If "full unicode string mode" only supported \u{} escapes then existing code that uses \uXXXX would have to be updated before it could be used in that mode.  That might be a good thing.

My point! ;-)

/be

Received on Monday, 20 February 2012 16:20:44 UTC