W3C home > Mailing lists > Public > public-rdf-wg@w3.org > February 2014

proposed email to Turtle and Trig implementors re: grammar change

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 12 Feb 2014 13:31:27 -0500
To: RDF WG <public-rdf-wg@w3.org>
Cc: Philippe Le Hegaret <plh@w3.org>
Message-ID: <20140212183125.GC19581@w3.org>
Per <https://www.w3.org/2011/rdf-wg/track/actions/345>, here's a
proposed message 

To: public-rdf-comments@w3.org
Cc: semantic-web@w3.org
Bcc: <each of the submitters of the implementation reports>

The RDF Working Group recently discovered an error in the grammars for
Turtle and TriG. They were intened to align with SPARQL, but a pair of
parentheses was accidentally dropped from the definition for long
strings resulting in an over-constraint on what's permitted after
embedded quotes. The text
  """ ""\" """
is legal in SPARQL but not in Turtle (or Trig). RDF 1.1 Turtle and
Trig can proceed to Recommendation on 25 Feb with this fixed if there
is consensus amongst the folks who submitted implementation reports
for those languages. Please let us know by 18 Feb if you intend to
implement the following grammar and parse the syntax tests below:

change
[24] STRING_LITERAL_LONG_SINGLE_QUOTE ::= "'''" (("'" | "''")? [^'\] | ECHAR | UCHAR)* "'''"
[25] STRING_LITERAL_LONG_QUOTE 	      ::= '"""' (('"' | '""')? [^"\] | ECHAR | UCHAR)* '"""'
to
[24] STRING_LITERAL_LONG_SINGLE_QUOTE ::= "'''" (("'" | "''")? ([^'\] | ECHAR | UCHAR))* "'''"
[25] STRING_LITERAL_LONG_QUOTE 	      ::= '"""' (('"' | '""')? ([^"\] | ECHAR | UCHAR))* '"""'

and parse these (one-line) turtle documents (with some arbitrary base URI):

<s> <p> ''' ''\' ''' .

<s> <p> """ ""\" """ .

<s> <p> """ ""\u0061 """ .

<s> <p> """""\"""" .

<s> <p> """""\u0061""" .


Many thanks for your help and continued support of RDF 1.1.


* Eric Prud'hommeaux <eric@w3.org> [2014-02-12 10:05-0500]
> I blindly replied to Guus's agenda with an issue I noticed in the
> grammar. Andy and David requested that I move it to it's own thread
> so I'm including the earlier conversation here. The tail of the old
> thread is <http://www.w3.org/mid/3025C528-82B8-47A4-9C83-34FAB4B76F45@3roundstones.com>. The summary is 12 lines down from here.
> 
> 
> * David Wood <david@3roundstones.com> [2014-02-12 08:16-0500]
> > On Feb 12, 2014, at 07:20, Guus Schreiber <guus.schreiber@vu.nl> wrote:
> > 
> > > 
> > > 
> > > On 12-02-14 13:01, Sandro Hawke wrote:
> > >> On February 12, 2014 6:10:20 AM EST, Eric Prud'hommeaux <eric@w3.org>
> > >> wrote:
> ...
> > >>> More interestingly, I noticed that we deviated from SPARQL's
> > >>> definition of strings 20 months ago when a re-gen of the HTML
> > >>> grammar stripped some ()s, going from:
> > >>> 
> > >>> [157s] STRING_LITERAL_LONG1 ::= "'''" (("'" | "''")? ([^'\] | ECHAR
> > >>> | UCHAR))* "'''" [158s] STRING_LITERAL_LONG2 ::= '"""' (('"' |
> > >>> '""')? ([^"\] | ECHAR | UCHAR))* '"""' to: [25]
> > >>> STRING_LITERAL_LONG1 ::= "'''" (("'" | "''")?  [^'\] | ECHAR |
> > >>> UCHAR) * "'''" [26]   STRING_LITERAL_LONG2 ::= '"""' (('"' | '""')?
> > >>> [^"\] | ECHAR | UCHAR) * '"""' —
> > >>> <https://dvcs.w3.org/hg/rdf/raw-file/b40e79fe8bbc/rdf-turtle/turtle-bnf.html>
> > >>> 
> > >>> 
> > >>> 
> > > In the former language, <s> <p> """ "\u0061 """ . is legal and in the
> > >>> latter, an embedded quote must not be followed by ECHAR (e.g. \")
> > >>> or UCHAR (e.g. \u0061). Unfortunately, this change was pre-Trig so
> > >>> the issue exists there as well.
> > >>> 
> > >>> I looked for tests with long (triple-quoted) strings with one or
> > >>> two quotes followed by a backslash. We have none, but SPARQL does:
> > >>> data-r2/syntax-sparql1/syntax-lit-17.rq:3:SELECT * WHERE { :x :p
> > >>> '''Long''\\Literal with '\\ single quotes ''' }
> > >>> data-r2/syntax-sparql1/syntax-lit-20.rq:3:SELECT * WHERE { :x :p
> > >>> """Long""\\Literal with "\\ single quotes""" }
> > >>> 
> > >>> The closest we have is LITERAL_LONG2_with_1_squote.ttl:
> > >>> <http://a.example/s> <http://a.example/p> """x""y""" . but the
> > >>> nested ""s can be parsed by taking the longer of alternatives of
> > >>> ('"' | '""').
> > >>> 
> > >>> What to do:
> > >>> 
> > >>> I propose the bold step of restoring the SPARQL grammar, noting
> > >>> that it doesn't change any of our test results.
> > >>> 
> > >> 
> > >> Argh.   Either that or we put it in the errata now.
> > > 
> > > If it is really an error we should fix it now. Given our explicit goal, known to to user community, to align Turtle as much as possible with SPARQL, it makes sense to view this as an error.
> > 
> > 
> > In the words of Eric Miller, "Eat crow when it is young and tender.”  I suggest to apologize, fix it and move on.  I view this as more procedural than world shattering. 
> > 
> > It would be a shame to have this in the errata before REC.
> > 
> > Regards,
> > Dave
> > --
> > http://about.me/david_wood
> > 
> > 
> > 
> > > 
> > > Gus
> > > 
> > >> 
> > >> So this raises the ongoing-test-suite question.  We should add this
> > >> (non-normatively, like the errata) to the test suite as well...  what
> > >> exactly will be the process for that, post-REC?
> > >> 
> > >> - Sandro
> > >> 
> > >>> 
> > >>>> Guus
> > >>>> 
> > >>>> [1] https://www.w3.org/2011/rdf-wg/wiki/Main_Page#REC_drafts
> > 
> 
> 
> 
> -- 
> -ericP
> 
> office: +1.617.599.3509
> mobile: +33.6.80.80.35.59
> 
> (eric@w3.org)
> Feel free to forward this message to any list for any purpose other than
> email address distribution.
> 
> There are subtle nuances encoded in font variation and clever layout
> which can only be seen by printing this message on high-clay paper.

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Received on Wednesday, 12 February 2014 18:31:59 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:02:19 UTC