W3C home > Mailing lists > Public > public-rdf-wg@w3.org > December 2012

Re: Turtle grammar -- LALR conflict free grammar

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 19 Dec 2012 08:28:16 -0500
To: Gregg Kellogg <gregg@greggkellogg.net>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, RDF-WG <public-rdf-wg@w3.org>
Message-ID: <20121219132815.GA6582@w3.org>
* Gregg Kellogg <gregg@greggkellogg.net> [2012-12-19 02:07-0500]
> On Dec 18, 2012, at 6:55 PM, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> > I haven't tried the productions below because I think the SPARQL grammar (also Andy's) provides a simpler solution:
> > 
> > -[7]     predicateObjectList   ::= verb objectList (';' predicateObjectList?)*
> > +[7]     predicateObjectList   ::= verb objectList (';' (verb objectList)? )*
> > 
> > http://www.w3.org/2005/01/yacker/uploads/turtleAwesome?lang=perl&markup=html#prod-turtleAwesome-predicateObjectList
> > 
> > [[
> > <s> <p> <o> .
> > <s> <p> <o> ; <p> <o> .
> > <s> <p> <o> ; <p> <o> ; .
> > <s> <p> <o> ; ; <p> <o> ;  ; .
> > ]]
> > works per <http://w3.org/brief/MzA1>
> > This should also work in LL land. Greg, could you confirm?
> 
> That's the original version of this rule, and the one I use in my purely LL(1) parser. It is indeed simpler, the only reason to do the other is if you wanted a rule to disallow multiple semi-colons, which we decided not to do.

Yeah, I poked around at the lineage last night. Turtle had it's own flavor from way back:

http://www.dajobe.org/2004/01/turtle/2006-12-04/#predicateObjectList
[5] predicateObjectList ::= verb ws+ objectList ( ws* ';' ws* verb ws+ objectList )* (ws* ';')?

http://www.w3.org/TeamSubmission/turtle/#predicateObjectList
[7] predicateObjectList ::= verb objectList ( ';' verb objectList )* ( ';')?

http://www.w3.org/TR/2011/WD-turtle-20110809/#prod-turtle2-predicateObjectList
[7] predicateObjectList ::= verb objectList ( ";" verb objectList )* (";")?

http://www.w3.org/TR/2012/WD-turtle-20120710/#grammar-production-predicateObjectList
[7] predicateObjectList ::= verb objectList (';' predicateObjectList?)*

http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/index.html#grammar-production-predicateObjectList
[7] predicateObjectList ::= verb objectList (';' predicateObjectList?)*

SPARQL picked up the repeated ';'s around March 2007 (4 years before the Turtle Submission)
http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/#rPropertyListNotEmpty
[33] PropertyListNotEmpty ::= Verb ObjectList ( ';' ( Verb ObjectList )? )*

Anyways, the ;;; change permits us to align the two specs and simplify not just our tasks.


> Gregg
> 
> > * Andy Seaborne <andy.seaborne@epimorphics.com> [2012-12-18 17:27+0000]
> >> tl:dr:
> >> The LC publication grammar can be implemented conflict-free in LALR.
> >> No change to the grammar is needed.
> >> 
> >> ------------------------------
> >> 
> >> I have made an attempt to mock up a Turtle parser using bison and
> >> flex in order to investigate:
> >> 
> >> [[ http://www.w3.org/2011/rdf-wg/meeting/2012-12-12#resolution_2
> >> 1 change from prior publication
> >> 2 that the WG wants "<><><>;;;." to be legal
> >> 3 that we will look for a grammar representation which produces no
> >> S/R conflicts in LL(1) or LALR(1)
> >> ]]
> >> 
> >> The LC text is the rule:
> >> 
> >> [7] predicateObjectList 	
> >>     ::= verb objectList (';' predicateObjectList?)*
> >> 
> >> and I split that into two pieces so that the (...)* is a separate
> >> rule and can be left (or right) recursive.  So I think the LC
> >> grammar is sufficient.  Efficient LALR grammars are known to be hard
> >> to read for most people because of the counter-intuitive reduction
> >> from right [1].
> >> 
> >> ---------------------
> >> predicateObjectList
> >>    : predicate objectList predicateObjectList2
> >>    ;
> >> 
> >> predicateObjectList2
> >>    :
> >>    | predicateObjectList2 SEMICOLON
> >>    | predicateObjectList2 SEMICOLON predicate objectList
> >>    ;
> >> ---------------------
> >> 
> >> It's not necessary to split the rule because directly as per LC [7]
> >> because this seems to work:
> >> 
> >> ---------------------
> >> predicateObjectList
> >>    : predicate objectList
> >>    | predicate objectList  semi
> >>    | predicate objectList  semi  predicateObjectList
> >>    ;
> >> 
> >> semi : SEMICOLON | semi SEMICOLON
> >> ---------------------
> >> but is right-recursive which Eric does not like.  Different writing
> >> of the grammar into LALR have different effects for the same
> >> language.
> >> 
> >> 
> >> The full mock up is available at
> >> 
> >> http://people.apache.org/~andy/c-ttl-3.zip
> >> 
> >> and it has no conflicts.  There are three versions (ttl0.y & ttl1.y
> >> & ttl2.y) - the extracts above are ttl1.y and ttl2.y.  ttl1 is
> >> harder to understand but possibly more efficient at scale, depending
> >> on the details of the LALR generator.
> >> 
> >> It is not a complete Turtle parser - it skimps on the tokens but
> >> does cover all the language features.  It even generates triples
> >> (except in collections because getting the rdf:first/rdf:rest right
> >> is distracting from the main point), to test the language actually
> >> parsed.
> >> 
> >> It even has the other key LALR feature - only one error message
> >> "syntax error" without details.  Any real LALR parser would
> >> incorporate additional rules to capture error conditions.
> >> 
> >> 	Andy
> >> 
> >> [1]
> >> http://en.wikipedia.org/wiki/LALR_parser#Implementation_issues
> >> 
> >> [2]
> >> See also: parser generators that it is claimed accept fairly EBNF
> >> directly: I have not tried any:
> >> 
> >> http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages
> >> 
> > 
> > -- 
> > -ericP
> > 
> 

-- 
-ericP
Received on Wednesday, 19 December 2012 13:28:47 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:53 GMT