Re: New full Unicode for ES6 idea from Wes Garland on 2012-02-20 (public-script-coord@w3.org from January to March 2012)

From: Wes Garland <wes@page.ca>
Date: Mon, 20 Feb 2012 07:19:16 -0500
To: Allen Wirfs-Brock <allen@wirfs-brock.com>
Cc: Gavin Barraclough <barraclough@apple.com>, public-script-coord@w3.org, Brendan Eich <brendan@mozilla.com>, Anne van Kesteren <annevk@opera.com>, mranney@voxer.com, es-discuss discussion <es-discuss@mozilla.org>
Message-ID: <CAHB0tE5LL9Wv14Rdgue_X6quopqE+-M7e4CQ21hz_TJE2uZh3A@mail.gmail.com>

On 20 February 2012 00:45, Allen Wirfs-Brock <allen@wirfs-brock.com> wrote:

>
> 2) Allow invalid unicode characters in strings, and preserve them over
> concatenation – ("\uD800" + "\uDC00").length == 2.
>

> I think 2) is the only reasonable alternative.
>

I think so, too -- especially as any sequence of Unicode code points --
including invalid and reserved code points -- constitutes a valid Unicode
string, according to my recollection of the Unicode specification.

In addition to the reasons you listed, it should also be noted that
- 2) is cheaper to implement
- 2) keeps more old code working; ignoring the examples where developers
use String as uint16[], there are also the cases where developers scan
strings for 0xD800. 0xD800 is a reserved code point.

I don't think 1) would be a very good choice, if for no other reason the
> set of valid unicode characters is a moving target that you wouldn't want
> to hardwire into either the ES specification or implementations.
>

To play the devil's advocate, I could point out that the spec language
could say something about reserved code points.  Those code points are
reserved because, IIRC, they are not representable in UTF-16; they include
the ranges for the surrogate pairs.

Wes

-- 
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Received on Monday, 20 February 2012 12:19:49 UTC