Re: Allowing \u escaped surrogate pairs

On 28/04/2026 19:49, Peter F. Patel-Schneider wrote:
> My argument is that numeric escapes for surrogates were illegal in RDF 
> 1.1 Turtle so why make them legal now?  This would mean that correct 
> implementations *have* to accept numeric escapes for surrogates, which 
> to me is an unnecessary additional (slight) burden for implementors.

I don't believe it was intended to be legal but the RDF 1.1 Turtle text 
- using "Unicode character" (which isn't a defined term) and text for 
numeric escape sequence:

     "Unicode character in the range U+0000 to U+FFFF"

mean things are not well-defined and it isn't so clear cut that it is 
illegal in RDF 1.1 Turtle (i18n first comment).

While the test suite has bad syntax tests for lone surrogates, there is 
nothing about bad pairs or legal pairs.

We can view this as an errata - i.e. acknowledge it does have an impact.

> Postel's law is for systems, not standards.  I would argue that 
> standards should not sanction questionable behaviour, which would make 
> them subject to the opposite of this law.
> 
> Implementations that consume Turtle could follow Postel's law and accept 
> numeric escapes for surrogates.  RDF 1.2 Concepts explicitly allows this 
> by stating "This specification does not define how Turtle parsers handle 
> non-conforming input documents.

The thread so far seems to come down the question;

Does the WG bless handling valid pair handling in any way?

 > "This specification does not define how Turtle parsers handle 
non-conforming input documents."

We could take this back to i18n - maybe with

s/MUST/MAY/ from https://github.com/w3c/rdf-turtle/issues/138

"""
Two adjacent numeric escape sequences forming a Surrogate Pair MAY be 
converted to a supplementary codepoint as described by Unicode 17.0 
section 3.9.2 UTF-16.
"""

and s/SHOULD/MUST/

"Systems MUST NOT produce serialized RDF with surrogate pairs encoded as 
numeric escape sequences."

     Andy

> 
> peter

Received on Wednesday, 29 April 2026 14:31:07 UTC