Re: How to deal with non-RFC IRIs in Turtle? from Wouter Beek on 2018-03-04 (semantic-web@w3.org from March 2018)

From: Wouter Beek <w.g.j.beek@vu.nl>
Date: Sun, 4 Mar 2018 13:50:51 +0100
To: Andy Seaborne <andy@seaborne.org>
CC: "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <CAEh2WcO6nCEfGBypf3PBULTFPeZ9neE0Hv+tK-HJ2Ht-aBMdJA@mail.gmail.com>

Hi Andy,

Thank you for your helpful response.  You explain that a Turtle parser
is not required to implement the full RFC URI/IRI grammars, and that
full IRI parsing may take place later in the RDF ingestion pipeline.  I
have follow-up questions about both of these points.

Regarding your first point, it is not entirely clear to me how a
Turtle parser is able to properly resolve relative IRIs without also
implementing (a non-trivial part of) the RFC grammars.  Since the
`IRIREF` rule is clearly insufficient in order to make this
distinction, I would expect the Turtle standard to make explicit the
minimal criteria a Turtle parser should implement in order to
determine whether an IRI is absolute and relative.

As to your second point, I have not often seen an RDF ingestion
pipeline in which the output of a Turtle parser is handed over to
another component that performs IRI validation.  From an architectural
viewpoint, such a setup also does not seem to make that much sense,
since the Turtle parser may resolve invalid IRIs that it deems relative to
valid absolute IRIs.  For example, Rapper parses [1], which contains an
invalid IRI as a subject term, into [2], which contains only valid
IRIs.  An IRI validator that takes the output from the Rapper parser
will say that [2] is valid, but the original input [1] is not valid.

  [1] base <https://example.org/a/>
      <_:s> <p:p> <o:o> .
  [2] <https://example.org/a/_:s> <p:p> <o:o> .

Of course, Rapper could be updated to not resolve the subject term in
[1], but that would violate the sequential approach too, since it would
require (partially) implementing the RFC grammars twice: once for the
Turtle parser, and once for the IRI validator.

---
Cheers,
Wouter.

Received on Sunday, 4 March 2018 12:52:06 UTC