- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Tue, 18 Dec 2012 17:27:04 +0000
- To: RDF-WG <public-rdf-wg@w3.org>
tl:dr: The LC publication grammar can be implemented conflict-free in LALR. No change to the grammar is needed. ------------------------------ I have made an attempt to mock up a Turtle parser using bison and flex in order to investigate: [[ http://www.w3.org/2011/rdf-wg/meeting/2012-12-12#resolution_2 1 change from prior publication 2 that the WG wants "<><><>;;;." to be legal 3 that we will look for a grammar representation which produces no S/R conflicts in LL(1) or LALR(1) ]] The LC text is the rule: [7] predicateObjectList ::= verb objectList (';' predicateObjectList?)* and I split that into two pieces so that the (...)* is a separate rule and can be left (or right) recursive. So I think the LC grammar is sufficient. Efficient LALR grammars are known to be hard to read for most people because of the counter-intuitive reduction from right [1]. --------------------- predicateObjectList : predicate objectList predicateObjectList2 ; predicateObjectList2 : | predicateObjectList2 SEMICOLON | predicateObjectList2 SEMICOLON predicate objectList ; --------------------- It's not necessary to split the rule because directly as per LC [7] because this seems to work: --------------------- predicateObjectList : predicate objectList | predicate objectList semi | predicate objectList semi predicateObjectList ; semi : SEMICOLON | semi SEMICOLON --------------------- but is right-recursive which Eric does not like. Different writing of the grammar into LALR have different effects for the same language. The full mock up is available at http://people.apache.org/~andy/c-ttl-3.zip and it has no conflicts. There are three versions (ttl0.y & ttl1.y & ttl2.y) - the extracts above are ttl1.y and ttl2.y. ttl1 is harder to understand but possibly more efficient at scale, depending on the details of the LALR generator. It is not a complete Turtle parser - it skimps on the tokens but does cover all the language features. It even generates triples (except in collections because getting the rdf:first/rdf:rest right is distracting from the main point), to test the language actually parsed. It even has the other key LALR feature - only one error message "syntax error" without details. Any real LALR parser would incorporate additional rules to capture error conditions. Andy [1] http://en.wikipedia.org/wiki/LALR_parser#Implementation_issues [2] See also: parser generators that it is claimed accept fairly EBNF directly: I have not tried any: http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages
Received on Tuesday, 18 December 2012 17:27:33 UTC