On 20 February 2012 00:45, Allen Wirfs-Brock <allen@wirfs-brock.com> wrote:
>
> 2) Allow invalid unicode characters in strings, and preserve them over
> concatenation – ("\uD800" + "\uDC00").length == 2.
>
> I think 2) is the only reasonable alternative.
>
I think so, too -- especially as any sequence of Unicode code points --
including invalid and reserved code points -- constitutes a valid Unicode
string, according to my recollection of the Unicode specification.
In addition to the reasons you listed, it should also be noted that
- 2) is cheaper to implement
- 2) keeps more old code working; ignoring the examples where developers
use String as uint16[], there are also the cases where developers scan
strings for 0xD800. 0xD800 is a reserved code point.
I don't think 1) would be a very good choice, if for no other reason the
> set of valid unicode characters is a moving target that you wouldn't want
> to hardwire into either the ES specification or implementations.
>
To play the devil's advocate, I could point out that the spec language
could say something about reserved code points. Those code points are
reserved because, IIRC, they are not representable in UTF-16; they include
the ranges for the surrogate pairs.
Wes
--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102