- From: <ddooss@wp.pl>
- Date: Mon, 07 Jul 2025 18:09:12 +0200
- To: Pierre-Antoine Champin <pierre-antoine@w3.org>,RDF & SPARQL Working Group <public-rdf-star-wg@w3.org>
- Message-ID: <2512ff25d6514a18b311aee5d3ce6799@grupawp.pl>
Hi all, thank you for the detailed summary of the discussion. I would like to express my strong opposition to Option 1 (i.e., stopping parsing and raising an error upon encountering RDF terms that are syntactically valid according to the Turtle grammar but invalid according to RDF-Concepts). In practice, such strict behavior may lead to poor robustness of tools and limit their applicability in real-world data processing scenarios, where ill-formed but syntactically conformant data is unfortunately common. While I acknowledge the merit of Option 2 (i.e., skipping invalid triples with a warning), and could accept it as a compromise, I am more inclined towards Option 3. Emitting triples containing invalid terms, while clearly warning about their potential non-conformance, preserves the fidelity of the original document and delegates responsibility for semantic validation to downstream processes. This aligns with the principle of graceful degradation and is often more useful in data cleaning and migration contexts. Such an approach is also consistent with the way some established implementations already operate, suggesting that it is both practical and acceptable in real-world deployments. Best regards, Dominik Tomaszuk Dnia 07 lipca 2025 15:13 Pierre-Antoine Champin <pierre-antoine@w3.org> napisaĆ(a): Hi all, we decided last week to discuss github.com rdf-turtle#37 in our next meeting, and I committed to send a summary of the discussion to the mailing list. In github.com rdf-turtle#37 , I pointed the following text complies with the Turtle grammar (as well as N-Triples, TriG and N-Quads), but does not represent a valid RDF triple: <x:s> <x:p> "foo"^^rdf:langString . More specifically, object of this triple does not match the definition, in RDF-Concepts (which requires a language tag when the datatype is rdf:langString). The scope of the discussion was then broaden to include a number of ill-formed terms that are technically allowed in the Turtle grammar, but do not correspond to RDF terms as defined by RDF-Concepts. "foo"@abcdefghi # the language tag does not comply with BCP47 "foo"@en--xyz # the base direction is not one of 'ltr' or 'rtl' <%> # the text between pointy brackets is not a valid IRI (NB the first two were also pointed out in github.com rdf-n-triples#33 ). There are good reasons for keeping the www.w3.org grammar of Turtle & co. simple enough (see github.com here and github.com here for more details), and defer further validation to the description of the www.w3.org parsing process . This is the spirit of PR github.com n-triples#68 adds some text in the "Parsing" section to this effect. This leaves the question open of how parsers should behave when they encounter such "grammatically valid" documents that result to invalid RDF terms... 1. stop parsing a raise an error 2. refrain from emitting invalid triples, raise a warning, but continue parsing 3. emit triples containing the invalid terms (with a warning) Option 1 is probably not a good idea: such invalid data github.com exists in the wild , and the fact that the document matches the grammar justifies that parsers should not just stop. Note however that that's how some parsers currently behave (e.g. Oxigraph, in some of the examples above). Option 2 is what github.com n-triples#68 currently proposes. Option 3 has the advantage of not losing any information compared to the source format, and let the use deal with the possibly invalid data. The drawback is that what it produce is then not guaranteed to be compliant with the abstract syntax. This is how Jena works -- and Oxigraph, for the "foo"^^rdf:langString case. best
Received on Monday, 7 July 2025 16:09:23 UTC