- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Tue, 28 Apr 2026 14:49:23 -0400
- To: public-rdf-star-wg@w3.org
My argument is that numeric escapes for surrogates were illegal in RDF 1.1 Turtle so why make them legal now? This would mean that correct implementations *have* to accept numeric escapes for surrogates, which to me is an unnecessary additional (slight) burden for implementors. Postel's law is for systems, not standards. I would argue that standards should not sanction questionable behaviour, which would make them subject to the opposite of this law. Implementations that consume Turtle could follow Postel's law and accept numeric escapes for surrogates. RDF 1.2 Concepts explicitly allows this by stating "This specification does not define how Turtle parsers handle non-conforming input documents. peter On 4/28/26 1:55 PM, Pierre-Antoine Champin wrote: > > > On 28 April 2026 18:03:04 GMT+01:00, "Peter F. Patel-Schneider" <pfpschneider@gmail.com> wrote: >> I'm for not allowing surrogates at all, keeping the situation unchanged. > > I'm slightly preferring this option as well, although I would not object to supporting them. > > If I understand correctly i18n's arguments: careless implementations may produce Turtle with surrogate pairs, so we would be more robust in accepting them rather than rejecting them (Postel's law). > > I would counter argue that RDF1.1 has been around for more than a decade and AFIK this has never been a problem. Bug again, maybe that's because some careless implementations of parsers do decode then, despite what the spec says :) > >> My view is that software that allowed surrogates was non-compliant and should remain non-compliant. >> >> Adding a test for "correct" surrogate pairs is optional. >> >> peter >> >> >> On 4/28/26 9:58 AM, ddooss@wp.pl wrote: >>> Hi all, >>> >>> >>> It seems to preserve the RDF 1.2 model - strings still denote Unicode scalar values - while allowing the common UTF-16-style escape form for non-BMP characters, e.g. \uD83C\uDCA1, when the surrogate pair is well-formed. >>> >>> So my mild preference would be: >>> >>> accept a valid high-surrogate + low-surrogate pair and interpret it as the corresponding scalar value; >>> >>> reject lone surrogates, reversed pairs, or malformed surrogate sequences. >>> >>> That said, I would also be fine with option 1, since it is simpler, stricter, and seems closer to the conservative reading of the current text. Option 2 only seems preferable to me if we want to avoid rejecting data that is probably intended to represent a valid Unicode character. >>> >>> >>> Best, >>> >>> Dominik >>> >>> *Dnia 28 kwietnia 2026 14:16* Peter F. Patel-Schneider >>> <mailto:pfpschneider@gmail.com> < pfpschneider@gmail.com > napisaĆ(a): >>> >>> [I'm deliberately not putting this in the issue, because I want the issue to >>> look clean.] >>> >>> As far as I can tell, surrogates are not allowed at all in RDF 1.1 Turtle. >>> The reason is that numeric escape sequences represent Unicode code points >>> that >>> are Unicode characters. This appears to be only stated in Section 6.4. >>> >>> So "\uD83C\uDCA1" is not valid in RDF 1.1 Turtle. >>> >>> Again as far as I can tell, RDF 1.2 Turtle liberalizes RDF 1.1 Turtle because >>> it allows any non-surrogate Unicode code point for numeric escape sequences, >>> not just Unicode characters. >>> >>> So "\uFFFE" is valid in RDF 1.2 Turtle, but not valid in RDF 1.1 Turtle. >>> >>> Does anyone disagree with my conclusions? >>> >>> peter >>> >>> >>> >>> >>> On 4/28/26 4:26 AM, Andy Seaborne wrote: >>> >>> As promised at the last telecon, I put together a position for >>> responding to >>> the i18n wide review comment [1] >>> >>> https://github.com/w3c/rdf-turtle/issues/138 <https://github.com/w3c/ >>> rdf-turtle/issues/138> >>> >>> Summary: support valid surrogate pairs written as \u escape sequences. >>> >>> Andy >>> >>> [1] https://github.com/w3c/rdf-turtle/issues/131 <https://github.com/ >>> w3c/rdf-turtle/issues/131> >>> https://github.com/w3c/rdf-trig/issues/60 <https://github.com/w3c/rdf- >>> trig/issues/60> >>> >>> >> >> >
Received on Tuesday, 28 April 2026 18:49:28 UTC