Re: New full Unicode for ES6 idea

On 20 February 2012 00:45, Allen Wirfs-Brock <allen@wirfs-brock.com> wrote:

>
> 2) Allow invalid unicode characters in strings, and preserve them over
> concatenation – ("\uD800" + "\uDC00").length == 2.
>


> I think 2) is the only reasonable alternative.
>

I think so, too -- especially as any sequence of Unicode code points --
including invalid and reserved code points -- constitutes a valid Unicode
string, according to my recollection of the Unicode specification.

In addition to the reasons you listed, it should also be noted that
- 2) is cheaper to implement
- 2) keeps more old code working; ignoring the examples where developers
use String as uint16[], there are also the cases where developers scan
strings for 0xD800. 0xD800 is a reserved code point.

I don't think 1) would be a very good choice, if for no other reason the
> set of valid unicode characters is a moving target that you wouldn't want
> to hardwire into either the ES specification or implementations.
>

To play the devil's advocate, I could point out that the spec language
could say something about reserved code points.  Those code points are
reserved because, IIRC, they are not representable in UTF-16; they include
the ranges for the surrogate pairs.

Wes

-- 
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Received on Monday, 20 February 2012 12:19:49 UTC