Re: New full Unicode for ES6 idea

Jussi Kalliokoski wrote:
> I'm not sure what to think about this, being a big fan of the UTF-8 
> simplicity. :) 

UTF-8 is great, but it's a transfer format, perfect for C and other such 
systems languages (especially ones that use byte-wide char from old 
days). It is not appropriate for JS, which gives users a "One True 
String" (sorry for caps) primitive type that has higher-level "just 
Unicode" semantics. Alas, JS's "just Unicode" was from '96.

There are lots of transfer formats and character set encodings. 
Implementations could use many, depending on what chars a given string 
uses. E.g. ASCII + UTF-16, UTF-8 only as you suggest, other 
combinations. But this would all be under the hood, and at some cost to 
the engine as well as some potential (space, mostly) savings.

> But anyhow, I like the idea of opt-in, actually so much that I started 
> thinking, why not make JS be encoding-agnostic?

That is precisely the idea. Setting the BRS to "full Unicode" gives the 
appearance of 21 bits per character via indexing and length accounting. 
You'd have to spell non-BMP literal escapes via "\u{...}", no big deal.

> What I mean here is that maybe we could have multi-charset Strings in JS?

Now you're saying something else. Having one agnostic higher-level "just 
Unicode" string type is one thing. That's JS's design goal, always has 
been. It does not imply adding multiple observable CSEs or UTFs that 
break the "just Unicode" abstraction.

If you can put a JS string in memory for low-level systems languages 
such as C to view, of course there are abstraction breaks. Engine APIs 
may or may not allow such views for optimizations. This is an issue, for 
sure, when embedding (e.g. V8 in Node). It's not a language design 
issue, though, and I'm focused on observables in the language because 
that is where JS currently fails by livin' in the '90s.

/be

Received on Sunday, 19 February 2012 16:05:51 UTC