- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Wed, 29 Apr 2026 10:39:43 -0400
- To: public-rdf-star-wg@w3.org
I agree that the prohibition wasn't very well worded, but according to Unicode, surrogates are not characters, just like noncharacters are not characters. So "Unicode character in the range ..." excludes surrogates, at least as I read it. peter On 4/29/26 10:30 AM, Andy Seaborne wrote: > On 28/04/2026 19:49, Peter F. Patel-Schneider wrote: >> My argument is that numeric escapes for surrogates were illegal in RDF 1.1 >> Turtle so why make them legal now? This would mean that correct >> implementations *have* to accept numeric escapes for surrogates, which to me >> is an unnecessary additional (slight) burden for implementors. > > I don't believe it was intended to be legal but the RDF 1.1 Turtle text - > using "Unicode character" (which isn't a defined term) and text for numeric > escape sequence: > > "Unicode character in the range U+0000 to U+FFFF" > > mean things are not well-defined and it isn't so clear cut that it is illegal > in RDF 1.1 Turtle (i18n first comment). > > While the test suite has bad syntax tests for lone surrogates, there is > nothing about bad pairs or legal pairs. > > We can view this as an errata - i.e. acknowledge it does have an impact. > >> Postel's law is for systems, not standards. I would argue that standards >> should not sanction questionable behaviour, which would make them subject to >> the opposite of this law. >> >> Implementations that consume Turtle could follow Postel's law and accept >> numeric escapes for surrogates. RDF 1.2 Concepts explicitly allows this by >> stating "This specification does not define how Turtle parsers handle non- >> conforming input documents. > > The thread so far seems to come down the question; > > Does the WG bless handling valid pair handling in any way? > > > "This specification does not define how Turtle parsers handle non- > conforming input documents." > > We could take this back to i18n - maybe with > > s/MUST/MAY/ from https://github.com/w3c/rdf-turtle/issues/138 > > """ > Two adjacent numeric escape sequences forming a Surrogate Pair MAY be > converted to a supplementary codepoint as described by Unicode 17.0 section > 3.9.2 UTF-16. > """ > > and s/SHOULD/MUST/ > > "Systems MUST NOT produce serialized RDF with surrogate pairs encoded as > numeric escape sequences." > > Andy > >> >> peter > >
Received on Wednesday, 29 April 2026 14:39:49 UTC