- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Tue, 18 Dec 2012 17:27:04 +0000
- To: RDF-WG <public-rdf-wg@w3.org>
tl:dr:
The LC publication grammar can be implemented conflict-free in LALR. No
change to the grammar is needed.
------------------------------
I have made an attempt to mock up a Turtle parser using bison and flex
in order to investigate:
[[ http://www.w3.org/2011/rdf-wg/meeting/2012-12-12#resolution_2
1 change from prior publication
2 that the WG wants "<><><>;;;." to be legal
3 that we will look for a grammar representation which produces no S/R
conflicts in LL(1) or LALR(1)
]]
The LC text is the rule:
[7] predicateObjectList
::= verb objectList (';' predicateObjectList?)*
and I split that into two pieces so that the (...)* is a separate rule
and can be left (or right) recursive. So I think the LC grammar is
sufficient. Efficient LALR grammars are known to be hard to read for
most people because of the counter-intuitive reduction from right [1].
---------------------
predicateObjectList
: predicate objectList predicateObjectList2
;
predicateObjectList2
:
| predicateObjectList2 SEMICOLON
| predicateObjectList2 SEMICOLON predicate objectList
;
---------------------
It's not necessary to split the rule because directly as per LC [7]
because this seems to work:
---------------------
predicateObjectList
: predicate objectList
| predicate objectList semi
| predicate objectList semi predicateObjectList
;
semi : SEMICOLON | semi SEMICOLON
---------------------
but is right-recursive which Eric does not like. Different writing of
the grammar into LALR have different effects for the same language.
The full mock up is available at
http://people.apache.org/~andy/c-ttl-3.zip
and it has no conflicts. There are three versions (ttl0.y & ttl1.y &
ttl2.y) - the extracts above are ttl1.y and ttl2.y. ttl1 is harder to
understand but possibly more efficient at scale, depending on the
details of the LALR generator.
It is not a complete Turtle parser - it skimps on the tokens but does
cover all the language features. It even generates triples (except in
collections because getting the rdf:first/rdf:rest right is distracting
from the main point), to test the language actually parsed.
It even has the other key LALR feature - only one error message "syntax
error" without details. Any real LALR parser would incorporate
additional rules to capture error conditions.
Andy
[1]
http://en.wikipedia.org/wiki/LALR_parser#Implementation_issues
[2]
See also: parser generators that it is claimed accept fairly EBNF
directly: I have not tried any:
http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages
Received on Tuesday, 18 December 2012 17:27:33 UTC