- From: <ddooss@wp.pl>
- Date: Mon, 07 Jul 2025 18:09:12 +0200
- To: Pierre-Antoine Champin <pierre-antoine@w3.org>,RDF & SPARQL Working Group <public-rdf-star-wg@w3.org>
- Message-ID: <2512ff25d6514a18b311aee5d3ce6799@grupawp.pl>
Hi all, thank you for the detailed summary of the discussion. I would like to express my strong opposition to Option 1 (i.e., stopping parsing and raising an error upon encountering RDF terms that are syntactically valid according to the Turtle grammar but invalid according to RDF-Concepts). In practice, such strict behavior may lead to poor robustness of tools and limit their applicability in real-world data processing scenarios, where ill-formed but syntactically conformant data is unfortunately common. While I acknowledge the merit of Option 2 (i.e., skipping invalid triples with a warning), and could accept it as a compromise, I am more inclined towards Option 3. Emitting triples containing invalid terms, while clearly warning about their potential non-conformance, preserves the fidelity of the original document and delegates responsibility for semantic validation to downstream processes. This aligns with the principle of graceful degradation and is often more useful in data cleaning and migration contexts. Such an approach is also consistent with the way some established implementations already operate, suggesting that it is both practical and acceptable in real-world deployments. Best regards, Dominik Tomaszuk Dnia 07 lipca 2025 15:13 Pierre-Antoine Champin <pierre-antoine@w3.org> napisaĆ(a): Hi all, we decided last week to discuss github.com rdf-turtle#37 in our next meeting, and I committed to send a summary of the
discussion to the mailing list. In github.com rdf-turtle#37 ,
I pointed the following text complies with the Turtle grammar (as
well as N-Triples, TriG and N-Quads), but does not represent a
valid RDF triple: <x:s> <x:p> "foo"^^rdf:langString . More specifically, object of this triple does not match the
definition, in RDF-Concepts (which requires a language tag when
the datatype is rdf:langString). The scope of the discussion was then broaden to include a number
of ill-formed terms that are technically allowed in the Turtle
grammar, but do not correspond to RDF terms as defined by
RDF-Concepts. "foo"@abcdefghi # the language tag does not comply with
BCP47 "foo"@en--xyz # the base direction is not one of 'ltr' or
'rtl' <%> # the text between pointy brackets is
not a valid IRI (NB the first two were also pointed out in github.com rdf-n-triples#33 ). There are good reasons for keeping the www.w3.org grammar of Turtle & co. simple enough (see github.com here and github.com here for more details), and defer further validation to the description of the www.w3.org parsing
process . This is the spirit of PR github.com n-triples#68 adds some text in the "Parsing" section to this effect. This leaves the question open of how parsers should behave when
they encounter such "grammatically valid" documents that result to
invalid RDF terms... 1. stop parsing a raise an error 2. refrain from emitting invalid triples, raise a warning, but
continue parsing 3. emit triples containing the invalid terms (with a warning) Option 1 is probably not a good idea: such invalid data github.com exists in the
wild , and the fact that the document matches the grammar
justifies that parsers should not just stop. Note however that
that's how some parsers currently behave (e.g. Oxigraph, in some
of the examples above). Option 2 is what github.com n-triples#68 currently proposes. Option 3 has the advantage of not losing any information compared
to the source format, and let the use deal with the possibly
invalid data. The drawback is that what it produce is then not
guaranteed to be compliant with the abstract syntax. This is how
Jena works -- and Oxigraph, for the "foo"^^rdf:langString case. best
Received on Monday, 7 July 2025 16:09:23 UTC