tightenting up the Turtle grammar

The Turtle spec says that parsing the PNAME_NS and PNAME_LN terminals
produces an IRI as defined in RDF Concepts.
  http://www.w3.org/TR/turtle/#handle-IRI
  http://www.w3.org/TR/turtle/#handle-PNAME_LN
  http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/#dfn-iri
RDF Concepts says that IRI is "a Unicode string [UNICODE] that
conforms to the syntax defined in RFC 3987 [RFC3987]." In sum, we
provide a pretty liberal grammar and then point to a hilariously
complex grammar, but don't expect anyone to enforce it.

Comments c23 "IRIREF production less restrictive than RFC3987" and c26
"PN_CHARS_BASE outside of IRI range" indicate some frustration with our
grammar which permits characters which aren't allowed anywhere in IRIs.

  <http://www.w3.org/2011/rdf-wg/wiki/Turtle_Candidate_Recommendation_Comments#c23>
  <http://www.w3.org/2011/rdf-wg/wiki/Turtle_Candidate_Recommendation_Comments#c26>

One approach would be to trim the bogus chars off of PN_CHARS_BASE and
include a note below the grammer which points directly at 3987 and
states that the IRIs constructed by either IRIREF or PNAME_LN are 3987
IRIs. This would would supplement the note about valid literal ranges
proposed to address c27.
  
  <http://www.w3.org/2011/rdf-wg/wiki/Turtle_Candidate_Recommendation_Comments#c27>
  <http://www.w3.org/mid/20130324145153.GN14139@w3.org>

I have spoken to those acting as W3C director. They consider this to
be a clarification and nothing that would require another LC.
-- 
-ericP

Received on Tuesday, 26 March 2013 21:01:53 UTC