- From: Sandro Hawke <sandro@w3.org>
- Date: Thu, 17 May 2012 16:35:31 -0400
- To: public-rdf-wg <public-rdf-wg@w3.org>
What should/may/must a Turtle parser do with a turtle document like this: <http://example.org/a> <http://example.org/a> <http://example.org/a|b>. By the grammar, this is not a Turtle document, because of the '|' character in a URI. I don't think, however, that people writing Turtle parsers will want to enforce this. If they come across some Turtle document that's got a URI like this -- they can still parse it just fine, so they probably will. The language tokens like IRIREF and PNAME are defined in the grammar with these vast regexps (if you macro-expand what's there, now), but actually much simpler ones will produce the same result in practice -- they'll just tolerate some files that are not, strictly-speaking, Turtle. (I'm pretty sure -- maybe there are some corner cases with missing whitespace where these regexps will give you a different result than something more like any-character-up-until-a-delimiter. I'm not sure anything has to change, but I think at very least the conformance clause should be clear about whether it's okay to accept a turtle document like my example above. It might be nice to have "strict" and "loose" parsers, especially if we can define loose parsers in a way that makes them simpler to implement, run faster, and never parse anything differently from a strict parser. Of course, then I'm not quite sure the point of the strict parsers. -- Sandro
Received on Thursday, 17 May 2012 20:35:49 UTC