- From: <ddooss@wp.pl>
- Date: Wed, 29 Apr 2026 16:36:46 +0200
- To: Andy Seaborne <andy@apache.org>,public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>
- Message-ID: <f2a29f0f1505482dacde25e2c7cf51f2@grupawp.pl>
Hi Andy, From my side, yes - this looks like a reasonable compromise. Best, Dominik
Dnia 29 kwietnia 2026 16:31
Andy Seaborne < andy@apache.org > napisaĆ(a):
On 28/04/2026 19:49, Peter F. Patel-Schneider wrote:
My argument is that numeric escapes for surrogates were illegal in RDF
1.1 Turtle so why make them legal now? This would mean that correct
implementations *have* to accept numeric escapes for surrogates, which
to me is an unnecessary additional (slight) burden for implementors.
I don't believe it was intended to be legal but the RDF 1.1 Turtle text
- using "Unicode character" (which isn't a defined term) and text for
numeric escape sequence:
"Unicode character in the range U+0000 to U+FFFF"
mean things are not well-defined and it isn't so clear cut that it is
illegal in RDF 1.1 Turtle (i18n first comment).
While the test suite has bad syntax tests for lone surrogates, there is
nothing about bad pairs or legal pairs.
We can view this as an errata - i.e. acknowledge it does have an impact.
Postel's law is for systems, not standards. I would argue that
standards should not sanction questionable behaviour, which would make
them subject to the opposite of this law.
Implementations that consume Turtle could follow Postel's law and accept
numeric escapes for surrogates. RDF 1.2 Concepts explicitly allows this
by stating "This specification does not define how Turtle parsers handle
non-conforming input documents.
The thread so far seems to come down the question;
Does the WG bless handling valid pair handling in any way?
> "This specification does not define how Turtle parsers handle
non-conforming input documents."
We could take this back to i18n - maybe with
s/MUST/MAY/ from github.com https://github.com/w3c/rdf-turtle/issues/138
"""
Two adjacent numeric escape sequences forming a Surrogate Pair MAY be
converted to a supplementary codepoint as described by Unicode 17.0
section 3.9.2 UTF-16.
"""
and s/SHOULD/MUST/
"Systems MUST NOT produce serialized RDF with surrogate pairs encoded as
numeric escape sequences."
Andy
peter
Received on Wednesday, 29 April 2026 14:37:02 UTC