Re: New full Unicode for ES6 idea

On 21 February 2012 00:03, Brendan Eich <brendan@mozilla.com> wrote:

> These are byte-based enodings, no? What is the problem inflating them by
> zero extension to 16 bits now (or 21 bits in the future)? You can't make an
> invalid Unicode character from a byte value.
>

One of my examples, GB 18030, is a four-byte encoding and a Chinese
government standard.  It is a mapping onto Unicode, but this mapping is
table-driven rather than algorithm driven like the UTF-* transport
formats.  To provide a single example, Unicode 0x2259 maps onto GB 18030
0x8136D830.

You're right about Big5 being byte-oriented, maybe this was a bad example,
although it is a double-byte charset. It works by putting ASCII down low
making bytes above 0x7f escapes into code pages dereferenced by the next
byte.  Each code point is encoded with one or two bytes, never more.  If I
were developing with Big5 in JS, I would store the byte stream 4a 4b d8 00
c1 c2 4c as  004a 004b d800 c1c2 004c.  This would allow me to use JS
regular expressions and so on.

Anyway, Big5 punned into JS strings (via a C or C++ API?) is *not* a strong
> use-case for ignoring invalid characters.
>

Agreed - I'm stretching to see if I can stretch far enough to find a real
problem with BRS -- because I really want it.

But the data does not need to arrive from C API -- it could easily be
delivered by an XHR request where, say, the remote end dumps database rows
into a transport format based around evaluating JS string literals (like
JSON).

Ball one. :-P
>

If I hit the batter, does he get to first base?

We still haven't talked about equality and normalization, I suppose that
can wait.

Wes

-- 
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Received on Tuesday, 21 February 2012 13:30:12 UTC