RE: agenda+ Fwd: Re: language for unicode string [I18N-ACTION-800]

> >
> > Supplementary characters (that is, those beyond the BMP) are not an issue.
> However, isolated (that is, *unpaired*) surrogate code units are permitted in
> JavaScript strings. The question is how to deal with them (not allowing them
> would be fine by me--for security they are often replaced by U+FFFD). So
> the question is whether you're permitted to have a string like "\uD800
> ABCDEFG \uD800\uDC00\uD800" (which starts and ends with an unpaired
> surrogate, but has a valid surrogate pair in the middle).
> 
> I'd say that
> [[
> Names are sequences of characters, which are scalar values as defined by
> Unicode (Section 2.4).
> ]]
> says no, but I can't lay my hands on tests to make sure implementations barf
> on it. (Part of the problem is that WASM tests input conditions are
> synthesized in a browser so it may be difficult to create such a string on some
> platforms.)
> 

If they are synthesized using JavaScript, it should be as simple as:

     String.fromCharCode(0xd800, 0xd800, 0xd00); // three isolated surrogates

Addison

Received on Tuesday, 14 May 2019 00:39:03 UTC