Re: New full Unicode for ES6 idea from Brendan Eich on 2012-02-20 (public-script-coord@w3.org from January to March 2012)

From: Brendan Eich <brendan@mozilla.com>
Date: Mon, 20 Feb 2012 12:32:38 -0800
To: Allen Wirfs-Brock <allen@wirfs-brock.com>
CC: Gavin Barraclough <barraclough@apple.com>, public-script-coord@w3.org, Anne van Kesteren <annevk@opera.com>, mranney@voxer.com, es-discuss discussion <es-discuss@mozilla.org>
Message-ID: <4F42ADE6.70303@mozilla.com>

Allen Wirfs-Brock wrote:
>
> On Feb 20, 2012, at 10:52 AM, Brendan Eich wrote:
>
>> Allen Wirfs-Brock wrote:
>> ...
>>> Another way to express what I see as the problem with what you are 
>>> proposing about imposing such string semantics:
>>>
>>> Could the revised ECMAScript be used to implement a language that 
>>> had similar but not identical semantic rules to those you are 
>>> suggested for ES strings.  My sense is that if we went down the path 
>>> you are suggesting, such a implementation would have to use binary 
>>> data arrays for all of its internal string processing and could not 
>>> use ES string functions to process them.
>>
>> If you mean a metacircular evaluator, I don't think so. Can you show 
>> a counterexample?
>>
>> If you mean a UTF-transcoder, then yes: binary data / typed arrays 
>> are required. That's the right answer.
>
> Not necessarily, metacircular...it could be support for any language 
> that imposes different semantic rules on string elements.

In that case, binary data / typed arrays, definitely.

> You are essentially saying that a compiler targeting ES for a language 
> X  that includes a string data type that does not confirm to your 
> rules (for example, by allowing occurrences of surrogate code points 
> within string data)
First, as a point of order: yes, JS strings as full Unicode does not 
want stray surrogate pair-halves. Does anyone disagree?

Second, binary data / typed arrays stand ready for any such 
not-full-Unicode use-cases.

> could not use ES strings as the target representation of its string 
> data type.  It also could not use the built-in ES string functions in 
> the implementation of language X's built-in functions.

Not if this hypothetical source language being compiled to JS wants 
other than full Unicode, no.

Why is this a problem, even hypothetically? Such a use-case has binary 
data and typed arrays standing ready, and if it really could use 
String.prototype.* methods I would be greatly surprised.

>  It could not leverage any optimizations that a ES engine may apply to 
> strings and string functions.

Emscripten already compiles LLVM source languages (C, C++, and 
Objective-C at least) to JS and does a very good job (getting better day 
by day). The utility of string function today (including uint16 indexing 
and length) is immaterial. Typed arrays are quite important, though.

> Also, values of X's string type can not be directly passed in foreign 
> calls to ES functions. Etc.

Emscripten does have a runtime that maps browser functionailty exposed 
to JS to the guest language. It does not AFAIK need to encode surrogate 
pairs in JS strings by hand, let alone make pair-halves.

/be

Received on Monday, 20 February 2012 20:33:05 UTC