Re: Allowing \u escaped surrogate pairs

I'm for not allowing surrogates at all, keeping the situation unchanged.

My view is that software that allowed surrogates was non-compliant and should 
remain non-compliant.

Adding a test for "correct" surrogate pairs is optional.

peter


On 4/28/26 9:58 AM, ddooss@wp.pl wrote:
> Hi all,
> 
> 
> It seems to preserve the RDF 1.2 model - strings still denote Unicode scalar 
> values - while allowing the common UTF-16-style escape form for non-BMP 
> characters, e.g. \uD83C\uDCA1, when the surrogate pair is well-formed.
> 
> So my mild preference would be:
> 
> accept a valid high-surrogate + low-surrogate pair and interpret it as the 
> corresponding scalar value;
> 
> reject lone surrogates, reversed pairs, or malformed surrogate sequences.
> 
> That said, I would also be fine with option 1, since it is simpler, stricter, 
> and seems closer to the conservative reading of the current text. Option 2 
> only seems preferable to me if we want to avoid rejecting data that is 
> probably intended to represent a valid Unicode character.
> 
> 
> Best,
> 
> Dominik
> 
>     *Dnia 28 kwietnia 2026 14:16* Peter F. Patel-Schneider
>     <mailto:pfpschneider@gmail.com> < pfpschneider@gmail.com > napisaƂ(a):
> 
>     [I'm deliberately not putting this in the issue, because I want the issue to
>     look clean.]
> 
>     As far as I can tell, surrogates are not allowed at all in RDF 1.1 Turtle.
>     The reason is that numeric escape sequences represent Unicode code points
>     that
>     are Unicode characters.  This appears to be only stated in Section 6.4.
> 
>     So "\uD83C\uDCA1" is not valid in RDF 1.1 Turtle.
> 
>     Again as far as I can tell, RDF 1.2 Turtle liberalizes RDF 1.1 Turtle because
>     it allows any non-surrogate Unicode code point for numeric escape sequences,
>     not just Unicode characters.
> 
>     So "\uFFFE" is valid in RDF 1.2 Turtle, but not valid in RDF 1.1 Turtle.
> 
>     Does anyone disagree with my conclusions?
> 
>     peter
> 
> 
> 
> 
>     On 4/28/26 4:26 AM, Andy Seaborne wrote:
> 
>         As promised at the last telecon, I put together a position for
>         responding to
>         the i18n wide review comment [1]
> 
>         https://github.com/w3c/rdf-turtle/issues/138 <https://github.com/w3c/
>         rdf-turtle/issues/138>
> 
>         Summary: support valid surrogate pairs written as \u escape sequences.
> 
>              Andy
> 
>         [1] https://github.com/w3c/rdf-turtle/issues/131 <https://github.com/
>         w3c/rdf-turtle/issues/131>
>         https://github.com/w3c/rdf-trig/issues/60 <https://github.com/w3c/rdf-
>         trig/issues/60>
> 
> 

Received on Tuesday, 28 April 2026 17:03:09 UTC