turtle conformance clause / strict-vs-loose parsing from Sandro Hawke on 2012-05-17 (public-rdf-wg@w3.org from May 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Thu, 17 May 2012 16:35:31 -0400
To: public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <1337286931.2379.96.camel@waldron>

What should/may/must a Turtle parser do with a turtle document like
this:

<http://example.org/a> <http://example.org/a> <http://example.org/a|b>.

By the grammar, this is not a Turtle document, because of the '|'
character in a URI.   I don't think, however, that people writing Turtle
parsers will want to enforce this.  If they come across some Turtle
document that's got a URI like this -- they can still parse it just
fine, so they probably will.

The language tokens like IRIREF and PNAME are defined in the grammar
with these vast regexps (if you macro-expand what's there, now), but
actually much simpler ones will produce the same result in practice --
they'll just tolerate some files that are not, strictly-speaking,
Turtle.  (I'm pretty sure -- maybe there are some corner cases with
missing whitespace where these regexps will give you a different result
than something more like any-character-up-until-a-delimiter.

I'm not sure anything has to change, but I think at very least the
conformance clause should be clear about whether it's okay to accept a
turtle document like my example above.

It might be nice to have "strict" and "loose" parsers, especially if we
can define loose parsers in a way that makes them simpler to implement,
run faster, and never parse anything differently from a strict parser.

Of course, then I'm not quite sure the point of the strict parsers.

  -- Sandro

Received on Thursday, 17 May 2012 20:35:49 UTC