Re: summary of rdf-turtle#37 from ddooss@wp.pl on 2025-07-07 (public-rdf-star-wg@w3.org from July 2025)

From: <ddooss@wp.pl>
Date: Mon, 07 Jul 2025 18:09:12 +0200
To: Pierre-Antoine Champin <pierre-antoine@w3.org>,RDF & SPARQL Working Group <public-rdf-star-wg@w3.org>
Message-ID: <2512ff25d6514a18b311aee5d3ce6799@grupawp.pl>
Hi all,   thank you for the detailed summary of the discussion.   I would like to express my strong opposition to Option 1 (i.e., stopping parsing and raising an error upon encountering RDF terms that are syntactically valid according to the Turtle grammar but invalid according to RDF-Concepts). In practice, such strict behavior may lead to poor robustness of tools and limit their applicability in real-world data processing scenarios, where ill-formed but syntactically conformant data is unfortunately common.   While I acknowledge the merit of Option 2 (i.e., skipping invalid triples with a warning), and could accept it as a compromise, I am more inclined towards Option 3. Emitting triples containing invalid terms, while clearly warning about their potential non-conformance, preserves the fidelity of the original document and delegates responsibility for semantic validation to downstream processes. This aligns with the principle of graceful degradation and is often more useful in data cleaning and migration contexts.   Such an approach is also consistent with the way some established implementations already operate, suggesting that it is both practical and acceptable in real-world deployments.   Best regards,  Dominik Tomaszuk   Dnia 07 lipca 2025 15:13 Pierre-Antoine Champin &lt;pierre-antoine@w3.org&gt; napisał(a):  Hi all,  we decided last week to discuss  github.com rdf-turtle#37  in our next meeting, and I committed to send a summary of the
      discussion to the mailing list.  In  github.com rdf-turtle#37 ,
      I pointed the following text complies with the Turtle grammar (as
      well as N-Triples, TriG and N-Quads), but does not represent a
      valid RDF triple:      &lt;x:s&gt; &lt;x:p&gt; &#34;foo&#34;^^rdf:langString .  More specifically, object of this triple does not match the
      definition, in RDF-Concepts (which requires a language tag when
      the datatype is rdf:langString).  The scope of the discussion was then broaden to include a number
      of ill-formed terms that are technically allowed in the Turtle
      grammar, but do not correspond to RDF terms as defined by
      RDF-Concepts.      &#34;foo&#34;@abcdefghi  # the language tag does not comply with
      BCP47       &#34;foo&#34;@en--xyz   # the base direction is not one of &#39;ltr&#39; or
      &#39;rtl&#39;       &lt;%&gt;             # the text between pointy brackets is
      not a valid IRI  (NB the first two were also pointed out in  github.com rdf-n-triples#33 ).  There are good reasons for keeping the  www.w3.org grammar  of Turtle &amp; co. simple enough (see  github.com here  and  github.com here  for more details),   and defer further validation to the description of the  www.w3.org parsing
        process .   This is the spirit of PR  github.com n-triples#68  adds some text in the &#34;Parsing&#34; section to this effect.  This leaves the question open of how parsers should behave when
      they encounter such &#34;grammatically valid&#34; documents that result to
      invalid RDF terms...  1. stop parsing a raise an error   2. refrain from emitting invalid triples, raise a warning, but
      continue parsing   3. emit triples containing the invalid terms (with a warning)  Option 1 is probably not a good idea: such invalid data  github.com exists in the
        wild , and the fact that the document matches the grammar
      justifies that parsers should not just stop. Note however that
      that&#39;s how some parsers currently behave (e.g. Oxigraph, in some
      of the examples above).  Option 2 is what  github.com n-triples#68  currently proposes.  Option 3 has the advantage of not losing any information compared
      to the source format, and let the use deal with the possibly
      invalid data. The drawback is that what it produce is then not
      guaranteed to be compliant with the abstract syntax. This is how
      Jena works -- and Oxigraph, for the &#34;foo&#34;^^rdf:langString case.     best
Received on Monday, 7 July 2025 16:09:23 UTC