Re: summary of rdf-turtle#37 from James Anderson on 2025-07-07 (public-rdf-star-wg@w3.org from July 2025)

From: James Anderson <anderson.james.1955@gmail.com>
Date: Mon, 7 Jul 2025 18:20:40 +0200
To: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-Id: <2C226027-632A-4BCC-B155-CBDF241C7506@gmail.com>

good evening;

is this not analogous to the situation with "support ill-typed literals with recognized datatype IRIs"?

> On 7. Jul 2025, at 15:12, Pierre-Antoine Champin <pierre-antoine@w3.org> wrote:
> 
> Hi all,
> we decided last week to discuss rdf-turtle#37 in our next meeting, and I committed to send a summary of the discussion to the mailing list.
> In rdf-turtle#37, I pointed the following text complies with the Turtle grammar (as well as N-Triples, TriG and N-Quads), but does not represent a valid RDF triple:
>     <x:s> <x:p> "foo"^^rdf:langString .
> More specifically, object of this triple does not match the definition, in RDF-Concepts (which requires a language tag when the datatype is rdf:langString).
> The scope of the discussion was then broaden to include a number of ill-formed terms that are technically allowed in the Turtle grammar, but do not correspond to RDF terms as defined by RDF-Concepts.
>     "foo"@abcdefghi  # the language tag does not comply with BCP47
>     "foo"@en--xyz   # the base direction is not one of 'ltr' or 'rtl'
>     <%>             # the text between pointy brackets is not a valid IRI
> (NB the first two were also pointed out in rdf-n-triples#33).
> There are good reasons for keeping the grammar of Turtle & co. simple enough (see here and here for more details),
> and defer further validation to the description of the parsing process.
> This is the spirit of PR n-triples#68 adds some text in the "Parsing" section to this effect.
> This leaves the question open of how parsers should behave when they encounter such "grammatically valid" documents that result to invalid RDF terms...
> 1. stop parsing a raise an error
> 2. refrain from emitting invalid triples, raise a warning, but continue parsing
> 3. emit triples containing the invalid terms (with a warning)
> Option 1 is probably not a good idea: such invalid data exists in the wild, and the fact that the document matches the grammar justifies that parsers should not just stop. Note however that that's how some parsers currently behave (e.g. Oxigraph, in some of the examples above).
> Option 2 is what n-triples#68 currently proposes.
> Option 3 has the advantage of not losing any information compared to the source format, and let the use deal with the possibly invalid data. The drawback is that what it produce is then not guaranteed to be compliant with the abstract syntax. This is how Jena works -- and Oxigraph, for the "foo"^^rdf:langString case.
>    best

---
james anderson | james@dydra.com | https://dydra.com

Received on Monday, 7 July 2025 16:20:56 UTC