Re: Allowing \u escaped surrogate pairs

good afternoon;

> On 30. Apr 2026, at 16:55, Gregory Williams <greg@evilfunhouse.com> wrote:
> 
> 
> 
>> On Apr 30, 2026, at 3:26 AM, James Anderson <anderson.james.1955@gmail.com> wrote:
>> 
>> i try to keep my model of program behaviours simple.
>> 
>> surrogate pairs appear as an element of the utf-16 encoding only.
> 
> I don’t think this is really true.

it may be, that i have misread the specification.
i had understood that use of the notions of "surrogate code points" or "utf-16 code units" was restricted to the definition of utf-16 encoding.


> Things like JSON use surrogates in escaping syntax (just like is being proposed here), even though many systems implementing it likely do not have an internal implementation based on UTF-16.

it may be that, if an implementation may apply a mechanism which is defined by utf-16, it does not require that they follow all its aspects.
how does that bear on whether one should fold the mechanism from one aspect of the standard into a distinct aspect which does not itself recognize the necessary basis?

if i read, https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/, this is what i find :

#D80Unicode string: A code unit sequence containing code units of a particular Unicode encoding form.
• In the rawest form, Unicode strings may be implemented simply as arrays of the appropriate integral data type, consisting of a sequence of code units lined up one immediately after the other.
• A single Unicode string must contain only code units from a single Unicode encoding form. It is not permissible to mix forms within a string.
#D81Unicode 8-bit string: A Unicode string containing only UTF-8 code units.
#D82Unicode 16-bit string: A Unicode string containing only UTF-16 code units.
#D83Unicode 32-bit string: A Unicode string containing only UTF-32 code units.

it does not surprise.

> There are of course historical reasons for this. But since we’re talking about *escaped* data, the underlying encoding isn’t critical to the decision here.
> 
> .greg
> 

---
james anderson | james@dydra.com | https://dydra.com

Received on Thursday, 30 April 2026 15:34:24 UTC