W3C home > Mailing lists > Public > public-rdf-wg@w3.org > December 2012

Re: Turtle grammar -- LALR conflict free grammar

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 18 Dec 2012 21:55:25 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <20121219025524.GA18481@w3.org>
I haven't tried the productions below because I think the SPARQL grammar (also Andy's) provides a simpler solution:

-[7]     predicateObjectList   ::= verb objectList (';' predicateObjectList?)*
+[7]     predicateObjectList   ::= verb objectList (';' (verb objectList)? )*

http://www.w3.org/2005/01/yacker/uploads/turtleAwesome?lang=perl&markup=html#prod-turtleAwesome-predicateObjectList

[[
<s> <p> <o> .
<s> <p> <o> ; <p> <o> .
<s> <p> <o> ; <p> <o> ; .
<s> <p> <o> ; ; <p> <o> ;  ; .
]]
works per <http://w3.org/brief/MzA1>
This should also work in LL land. Greg, could you confirm?

* Andy Seaborne <andy.seaborne@epimorphics.com> [2012-12-18 17:27+0000]
> tl:dr:
> The LC publication grammar can be implemented conflict-free in LALR.
> No change to the grammar is needed.
> 
> ------------------------------
> 
> I have made an attempt to mock up a Turtle parser using bison and
> flex in order to investigate:
> 
> [[ http://www.w3.org/2011/rdf-wg/meeting/2012-12-12#resolution_2
>  1 change from prior publication
>  2 that the WG wants "<><><>;;;." to be legal
>  3 that we will look for a grammar representation which produces no
> S/R conflicts in LL(1) or LALR(1)
> ]]
> 
> The LC text is the rule:
> 
> [7] predicateObjectList 	
>      ::= verb objectList (';' predicateObjectList?)*
> 
> and I split that into two pieces so that the (...)* is a separate
> rule and can be left (or right) recursive.  So I think the LC
> grammar is sufficient.  Efficient LALR grammars are known to be hard
> to read for most people because of the counter-intuitive reduction
> from right [1].
> 
> ---------------------
> predicateObjectList
>     : predicate objectList predicateObjectList2
>     ;
> 
> predicateObjectList2
>     :
>     | predicateObjectList2 SEMICOLON
>     | predicateObjectList2 SEMICOLON predicate objectList
>     ;
> ---------------------
> 
> It's not necessary to split the rule because directly as per LC [7]
> because this seems to work:
> 
> ---------------------
> predicateObjectList
>     : predicate objectList
>     | predicate objectList  semi
>     | predicate objectList  semi  predicateObjectList
>     ;
> 
> semi : SEMICOLON | semi SEMICOLON
> ---------------------
> but is right-recursive which Eric does not like.  Different writing
> of the grammar into LALR have different effects for the same
> language.
> 
> 
> The full mock up is available at
> 
> http://people.apache.org/~andy/c-ttl-3.zip
> 
> and it has no conflicts.  There are three versions (ttl0.y & ttl1.y
> & ttl2.y) - the extracts above are ttl1.y and ttl2.y.  ttl1 is
> harder to understand but possibly more efficient at scale, depending
> on the details of the LALR generator.
> 
> It is not a complete Turtle parser - it skimps on the tokens but
> does cover all the language features.  It even generates triples
> (except in collections because getting the rdf:first/rdf:rest right
> is distracting from the main point), to test the language actually
> parsed.
> 
> It even has the other key LALR feature - only one error message
> "syntax error" without details.  Any real LALR parser would
> incorporate additional rules to capture error conditions.
> 
> 	Andy
> 
> [1]
> http://en.wikipedia.org/wiki/LALR_parser#Implementation_issues
> 
> [2]
> See also: parser generators that it is claimed accept fairly EBNF
> directly: I have not tried any:
> 
> http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages
> 

-- 
-ericP
Received on Wednesday, 19 December 2012 02:55:56 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:53 GMT