- From: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Date: Tue, 28 Apr 2026 18:55:10 +0100
- To: public-rdf-star-wg@w3.org
On 28 April 2026 18:03:04 GMT+01:00, "Peter F. Patel-Schneider" <pfpschneider@gmail.com> wrote: >I'm for not allowing surrogates at all, keeping the situation unchanged. I'm slightly preferring this option as well, although I would not object to supportinc them. If I understand correctly i18n's arguments: careless implementations may produce Turtle with surrogate pairs, so we would be more robust in accepting them rather than rejecting them (Postel's law). I would counter argue that RDF1.1 has been around for more than a decade and AFIK this has never been a problem. Bug again, maybe that's because some careless implementations of parsers do decode then, despite what the spec says :) >My view is that software that allowed surrogates was non-compliant and should remain non-compliant. > >Adding a test for "correct" surrogate pairs is optional. > >peter > > >On 4/28/26 9:58 AM, ddooss@wp.pl wrote: >> Hi all, >> >> >> It seems to preserve the RDF 1.2 model - strings still denote Unicode scalar values - while allowing the common UTF-16-style escape form for non-BMP characters, e.g. \uD83C\uDCA1, when the surrogate pair is well-formed. >> >> So my mild preference would be: >> >> accept a valid high-surrogate + low-surrogate pair and interpret it as the corresponding scalar value; >> >> reject lone surrogates, reversed pairs, or malformed surrogate sequences. >> >> That said, I would also be fine with option 1, since it is simpler, stricter, and seems closer to the conservative reading of the current text. Option 2 only seems preferable to me if we want to avoid rejecting data that is probably intended to represent a valid Unicode character. >> >> >> Best, >> >> Dominik >> >> *Dnia 28 kwietnia 2026 14:16* Peter F. Patel-Schneider >> <mailto:pfpschneider@gmail.com> < pfpschneider@gmail.com > napisaĆ(a): >> >> [I'm deliberately not putting this in the issue, because I want the issue to >> look clean.] >> >> As far as I can tell, surrogates are not allowed at all in RDF 1.1 Turtle. >> The reason is that numeric escape sequences represent Unicode code points >> that >> are Unicode characters. This appears to be only stated in Section 6.4. >> >> So "\uD83C\uDCA1" is not valid in RDF 1.1 Turtle. >> >> Again as far as I can tell, RDF 1.2 Turtle liberalizes RDF 1.1 Turtle because >> it allows any non-surrogate Unicode code point for numeric escape sequences, >> not just Unicode characters. >> >> So "\uFFFE" is valid in RDF 1.2 Turtle, but not valid in RDF 1.1 Turtle. >> >> Does anyone disagree with my conclusions? >> >> peter >> >> >> >> >> On 4/28/26 4:26 AM, Andy Seaborne wrote: >> >> As promised at the last telecon, I put together a position for >> responding to >> the i18n wide review comment [1] >> >> https://github.com/w3c/rdf-turtle/issues/138 <https://github.com/w3c/ >> rdf-turtle/issues/138> >> >> Summary: support valid surrogate pairs written as \u escape sequences. >> >> Andy >> >> [1] https://github.com/w3c/rdf-turtle/issues/131 <https://github.com/ >> w3c/rdf-turtle/issues/131> >> https://github.com/w3c/rdf-trig/issues/60 <https://github.com/w3c/rdf- >> trig/issues/60> >> >> > >
Received on Tuesday, 28 April 2026 17:55:14 UTC