W3C home > Mailing lists > Public > public-rdf-wg@w3.org > March 2013

Re: tightenting up the Turtle grammar

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 27 Mar 2013 09:00:35 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <20130327130033.GB11591@w3.org>
* Andy Seaborne <andy.seaborne@epimorphics.com> [2013-03-27 08:40+0000]
> 
> 
> On 26/03/13 21:01, Eric Prud'hommeaux wrote:
> >The Turtle spec says that parsing the PNAME_NS and PNAME_LN terminals
> >produces an IRI as defined in RDF Concepts.
> >   http://www.w3.org/TR/turtle/#handle-IRI
> >   http://www.w3.org/TR/turtle/#handle-PNAME_LN
> >   http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/#dfn-iri
> >RDF Concepts says that IRI is "a Unicode string [UNICODE] that
> >conforms to the syntax defined in RFC 3987 [RFC3987]." In sum, we
> >provide a pretty liberal grammar and then point to a hilariously
> >complex grammar, but don't expect anyone to enforce it.
> 
> Don't we? :-)

I may be wrong. You could create some negative IRI evaluation tests to
make the conversation more concrete. I'm not psyched to up the bar, but
maybe others are. I understand that Jena warns about IRIs outside of NFC,
which pushes users to produce IRIs which are more predictable, but does
it 3987:validate IRIs?

positive:
  my-scheme://::@-._~ :1?/?#/?%00日本
negative:
  my-scheme://:@-._~ :1?/?#/?%00日本


> >Comments c23 "IRIREF production less restrictive than RFC3987" and c26
> >"PN_CHARS_BASE outside of IRI range" indicate some frustration with our
> >grammar which permits characters which aren't allowed anywhere in IRIs.
> >
> >   <http://www.w3.org/2011/rdf-wg/wiki/Turtle_Candidate_Recommendation_Comments#c23>
> >   <http://www.w3.org/2011/rdf-wg/wiki/Turtle_Candidate_Recommendation_Comments#c26>
> >
> >One approach would be to trim the bogus chars off of PN_CHARS_BASE and
> >include a note below the grammer which points directly at 3987 and
> >states that the IRIs constructed by either IRIREF or PNAME_LN are 3987
> >IRIs. This would would supplement the note about valid literal ranges
> >proposed to address c27.
> >
> >   <http://www.w3.org/2011/rdf-wg/wiki/Turtle_Candidate_Recommendation_Comments#c27>
> >   <http://www.w3.org/mid/20130324145153.GN14139@w3.org>
> >
> >I have spoken to those acting as W3C director. They consider this to
> >be a clarification and nothing that would require another LC.
> 
> The PN_CHARS_BASE rule is the same as the XML rule for NameStartChar
> without the ':'
> 
> If we alter PN_CHARS_BASE won't there be ways to write in RDF/XML
> something that can't be written in the Turtle grammar?  Sure - it
> may lead to a illegal IRI but it means we already depend on IRI
> checking for that if it is "not enforced" we have IRI strings via
> RDF/XML that can't be written in the similar way in Turtle.

Fair point, worth considering. Let's look at an RDF/XML doc which
passes the grammar but doesn't produce a valid RDF graph:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description rdf:about="http://www.w3.org/"> <!-- ends with U+FFFE -->
    <dc:title>World Wide Web Consortium</dc:title> 
  </rdf:Description>
</rdf:RDF>

You could write it in Turtle, but you'd need an escape:

Illegal by this proposal:
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<http://purl.org/dc/elements/1.1/> dc:title "World Wide Web Consortium" .

Legal by this proposal:
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<http://purl.org/dc/elements/1.1/\uFFFE> dc:title "World Wide Web Consortium" .

I'd argue that Turtle and SPARQL shouldn't be beholden to reproducing
invalid RDF graphs anyways. (Well, I'm sure we both think that, but
our intuitions about optimal the degree may differ.)


> (I'm not adverse to a change - including filing a SPARQL errata -
> but we do have to fit everything together.  SPARQL 1.0 took the
> character ranges because of RDF/XML.)

And that made a lot of sense. It wasn't until the comments that I ever
thought to perform the arithmetic to surface the low-hanging 3987
restrictions.

If we adopt this quickly, we might be able to set a record for the
shortest time at REC without an errata. Apart from competitive
interests, there's no real hurry. I was trying to get through the
test-related comments ASAP, this kind bubbled up with them.

-- 
-ericP
Received on Wednesday, 27 March 2013 13:01:05 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:26 UTC