Re: New full Unicode for ES6 idea from Allen Wirfs-Brock on 2012-02-21 (public-script-coord@w3.org from January to March 2012)

From: Allen Wirfs-Brock <allen@wirfs-brock.com>
Date: Mon, 20 Feb 2012 17:59:23 -0800
To: Wes Garland <wes@page.ca>
Cc: Brendan Eich <brendan@mozilla.com>, public-script-coord@w3.org, Anne van Kesteren <annevk@opera.com>, mranney@voxer.com, es-discuss discussion <es-discuss@mozilla.org>
Message-Id: <2501E69C-C5CE-4C13-8C90-C2E0F0539FE3@wirfs-brock.com>

On Feb 20, 2012, at 1:42 PM, Wes Garland wrote:

> On 20 February 2012 16:00, Allen Wirfs-Brock <allen@wirfs-brock.com> wrote:
> 
> ...
> Observation -- disallowing otherwise "legal" Unicode strings because they contain code points d800-dfff has very concrete implementation benefits: it's possible to use UTF-16 to represent the String's backing store.  Without this concession, I fear it may not be possible to implement BRS-on without using a UTF-8 or full code point  backing store (or some non-standard invention).

(or using multiple representations)
> 

Yes, I understand.  If it is a requirement (or even a goal) to enable implementation to use UTF-16 as the backing store, we should be clearer about it being so.  


> Maybe the answer is to consider (shudder) adding String-like utility functions to the TypedArrays?  FWIW, CommonJS tried to go down this path and it turned out to be a lot of work for very little benefit (if any). 
> 
> But with the BRS flipped it would have to censor C "strings" passed to JS to ensure that unmatched surrogate pairs are present.
> 
> Only if the C strings are wide-character strings.  8-bit char strings are fine, they map right onto Latin-1 in native Unicode as well as the UTF-16 and UCS-2 encodings.

Yes, I was assuming WCHAR strings

Allen

Received on Tuesday, 21 February 2012 02:00:04 UTC