- From: James Anderson <anderson.james.1955@gmail.com>
- Date: Thu, 30 Apr 2026 17:33:07 +0200
- To: RDF-star Working Group <public-rdf-star-wg@w3.org>
good afternoon; > On 30. Apr 2026, at 16:55, Gregory Williams <greg@evilfunhouse.com> wrote: > > > >> On Apr 30, 2026, at 3:26 AM, James Anderson <anderson.james.1955@gmail.com> wrote: >> >> i try to keep my model of program behaviours simple. >> >> surrogate pairs appear as an element of the utf-16 encoding only. > > I don’t think this is really true. it may be, that i have misread the specification. i had understood that use of the notions of "surrogate code points" or "utf-16 code units" was restricted to the definition of utf-16 encoding. > Things like JSON use surrogates in escaping syntax (just like is being proposed here), even though many systems implementing it likely do not have an internal implementation based on UTF-16. it may be that, if an implementation may apply a mechanism which is defined by utf-16, it does not require that they follow all its aspects. how does that bear on whether one should fold the mechanism from one aspect of the standard into a distinct aspect which does not itself recognize the necessary basis? if i read, https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/, this is what i find : #D80Unicode string: A code unit sequence containing code units of a particular Unicode encoding form. • In the rawest form, Unicode strings may be implemented simply as arrays of the appropriate integral data type, consisting of a sequence of code units lined up one immediately after the other. • A single Unicode string must contain only code units from a single Unicode encoding form. It is not permissible to mix forms within a string. #D81Unicode 8-bit string: A Unicode string containing only UTF-8 code units. #D82Unicode 16-bit string: A Unicode string containing only UTF-16 code units. #D83Unicode 32-bit string: A Unicode string containing only UTF-32 code units. it does not surprise. > There are of course historical reasons for this. But since we’re talking about *escaped* data, the underlying encoding isn’t critical to the decision here. > > .greg > --- james anderson | james@dydra.com | https://dydra.com
Received on Thursday, 30 April 2026 15:34:24 UTC