- From: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Date: Tue, 28 Apr 2026 21:34:54 +0100
- To: Gregory Williams <greg@evilfunhouse.com>
- Cc: public-rdf-star-wg@w3.org
On 28/04/2026 19:14, Gregory Williams wrote:
> On Apr 28, 2026, at 10:55 AM, Pierre-Antoine Champin <pierre-antoine@w3.org> wrote:
>> On 28 April 2026 18:03:04 GMT+01:00, "Peter F. Patel-Schneider" <pfpschneider@gmail.com> wrote:
>>> I'm for not allowing surrogates at all, keeping the situation unchanged.
>> I'm slightly preferring this option as well, although I would not object to supportinc them.
> I also prefer this option, with a stronger resistance to allowing surrogates. Use of surrogates in escapes strikes me as a bad thing in formats that have a way to escape the same data without surrogates. The surrogate-escaping obfuscate what code point is being encoded in the surface syntax, which strikes me as something that has no obvious benefit to the end-user, but introduces real usability issues.
>
>> If I understand correctly i18n's arguments: careless implementations may produce Turtle with surrogate pairs, so we would be more robust in accepting them rather than rejecting them (Postel's law).
> We’ve pushed back on some of their other suggestions (e.g. the “\u{XXXX}” syntax) with the reasoning that it introduces compatibility issues. I think a similar argument could be made here in that allowing surrogate escaping is another feature that may be invalid for existing 1.1 parsers’ handling of 1.2 data that might otherwise be compatible.
That's a very good point, indeed.
>
> In terms of the robustness principle, implementations seem free to accept such invalid surrogate data (and ensure that they never emit it), but I don’t think that has to mean that we standardize such support in the grammar.
>
>> I would counter argue that RDF1.1 has been around for more than a decade and AFIK this has never been a problem. Bug again, maybe that's because some careless implementations of parsers do decode then, despite what the spec says :)
> I would say that’s *still* a problem, because even if some “careless" implementations have allowed it, some other implementations do not allow it, and such use is already risking interop problems.
>
> .greg
>
>
Received on Tuesday, 28 April 2026 20:34:58 UTC