- From: Phillips, Addison <addison@lab126.com>
- Date: Tue, 14 May 2019 00:38:33 +0000
- To: "Eric Prud'hommeaux" <eric@w3.org>
- CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "binji@google.com" <binji@google.com>
> >
> > Supplementary characters (that is, those beyond the BMP) are not an issue.
> However, isolated (that is, *unpaired*) surrogate code units are permitted in
> JavaScript strings. The question is how to deal with them (not allowing them
> would be fine by me--for security they are often replaced by U+FFFD). So
> the question is whether you're permitted to have a string like "\uD800
> ABCDEFG \uD800\uDC00\uD800" (which starts and ends with an unpaired
> surrogate, but has a valid surrogate pair in the middle).
>
> I'd say that
> [[
> Names are sequences of characters, which are scalar values as defined by
> Unicode (Section 2.4).
> ]]
> says no, but I can't lay my hands on tests to make sure implementations barf
> on it. (Part of the problem is that WASM tests input conditions are
> synthesized in a browser so it may be difficult to create such a string on some
> platforms.)
>
If they are synthesized using JavaScript, it should be as simple as:
String.fromCharCode(0xd800, 0xd800, 0xd00); // three isolated surrogates
Addison
Received on Tuesday, 14 May 2019 00:39:03 UTC