W3C home > Mailing lists > Public > public-rdf-wg@w3.org > December 2012

Turtle grammar -- LALR conflict free grammar

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 18 Dec 2012 17:27:04 +0000
Message-ID: <50D0A768.9020409@epimorphics.com>
To: RDF-WG <public-rdf-wg@w3.org>
The LC publication grammar can be implemented conflict-free in LALR.  No 
change to the grammar is needed.


I have made an attempt to mock up a Turtle parser using bison and flex 
in order to investigate:

[[ http://www.w3.org/2011/rdf-wg/meeting/2012-12-12#resolution_2
  1 change from prior publication
  2 that the WG wants "<><><>;;;." to be legal
  3 that we will look for a grammar representation which produces no S/R 
conflicts in LL(1) or LALR(1)

The LC text is the rule:

[7] predicateObjectList 	
      ::= verb objectList (';' predicateObjectList?)*

and I split that into two pieces so that the (...)* is a separate rule 
and can be left (or right) recursive.  So I think the LC grammar is 
sufficient.  Efficient LALR grammars are known to be hard to read for 
most people because of the counter-intuitive reduction from right [1].

     : predicate objectList predicateObjectList2

     | predicateObjectList2 SEMICOLON
     | predicateObjectList2 SEMICOLON predicate objectList

It's not necessary to split the rule because directly as per LC [7] 
because this seems to work:

     : predicate objectList
     | predicate objectList  semi
     | predicate objectList  semi  predicateObjectList

but is right-recursive which Eric does not like.  Different writing of 
the grammar into LALR have different effects for the same language.

The full mock up is available at


and it has no conflicts.  There are three versions (ttl0.y & ttl1.y & 
ttl2.y) - the extracts above are ttl1.y and ttl2.y.  ttl1 is harder to 
understand but possibly more efficient at scale, depending on the 
details of the LALR generator.

It is not a complete Turtle parser - it skimps on the tokens but does 
cover all the language features.  It even generates triples (except in 
collections because getting the rdf:first/rdf:rest right is distracting 
from the main point), to test the language actually parsed.

It even has the other key LALR feature - only one error message "syntax 
error" without details.  Any real LALR parser would incorporate 
additional rules to capture error conditions.



See also: parser generators that it is claimed accept fairly EBNF 
directly: I have not tried any:

Received on Tuesday, 18 December 2012 17:27:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:02:09 UTC