Re: How to deal with non-RFC IRIs in Turtle?

Hi Andy,

Thank you for your helpful response.  You explain that a Turtle parser
is not required to implement the full RFC URI/IRI grammars, and that
full IRI parsing may take place later in the RDF ingestion pipeline.  I
have follow-up questions about both of these points.

Regarding your first point, it is not entirely clear to me how a
Turtle parser is able to properly resolve relative IRIs without also
implementing (a non-trivial part of) the RFC grammars.  Since the
`IRIREF` rule is clearly insufficient in order to make this
distinction, I would expect the Turtle standard to make explicit the
minimal criteria a Turtle parser should implement in order to
determine whether an IRI is absolute and relative.

As to your second point, I have not often seen an RDF ingestion
pipeline in which the output of a Turtle parser is handed over to
another component that performs IRI validation.  From an architectural
viewpoint, such a setup also does not seem to make that much sense,
since the Turtle parser may resolve invalid IRIs that it deems relative to
valid absolute IRIs.  For example, Rapper parses [1], which contains an
invalid IRI as a subject term, into [2], which contains only valid
IRIs.  An IRI validator that takes the output from the Rapper parser
will say that [2] is valid, but the original input [1] is not valid.

  [1] base <https://example.org/a/>
      <_:s> <p:p> <o:o> .
  [2] <https://example.org/a/_:s> <p:p> <o:o> .

Of course, Rapper could be updated to not resolve the subject term in
[1], but that would violate the sequential approach too, since it would
require (partially) implementing the RFC grammars twice: once for the
Turtle parser, and once for the IRI validator.

---
Cheers,
Wouter.

Received on Sunday, 4 March 2018 12:52:06 UTC