Re: Turtle grammar -- LALR conflict free grammar

On Dec 18, 2012, at 6:55 PM, Eric Prud'hommeaux <eric@w3.org> wrote:

> I haven't tried the productions below because I think the SPARQL grammar (also Andy's) provides a simpler solution:
> 
> -[7]     predicateObjectList   ::= verb objectList (';' predicateObjectList?)*
> +[7]     predicateObjectList   ::= verb objectList (';' (verb objectList)? )*
> 
> http://www.w3.org/2005/01/yacker/uploads/turtleAwesome?lang=perl&markup=html#prod-turtleAwesome-predicateObjectList
> 
> [[
> <s> <p> <o> .
> <s> <p> <o> ; <p> <o> .
> <s> <p> <o> ; <p> <o> ; .
> <s> <p> <o> ; ; <p> <o> ;  ; .
> ]]
> works per <http://w3.org/brief/MzA1>
> This should also work in LL land. Greg, could you confirm?

That's the original version of this rule, and the one I use in my purely LL(1) parser. It is indeed simpler, the only reason to do the other is if you wanted a rule to disallow multiple semi-colons, which we decided not to do.

Gregg

> * Andy Seaborne <andy.seaborne@epimorphics.com> [2012-12-18 17:27+0000]
>> tl:dr:
>> The LC publication grammar can be implemented conflict-free in LALR.
>> No change to the grammar is needed.
>> 
>> ------------------------------
>> 
>> I have made an attempt to mock up a Turtle parser using bison and
>> flex in order to investigate:
>> 
>> [[ http://www.w3.org/2011/rdf-wg/meeting/2012-12-12#resolution_2
>> 1 change from prior publication
>> 2 that the WG wants "<><><>;;;." to be legal
>> 3 that we will look for a grammar representation which produces no
>> S/R conflicts in LL(1) or LALR(1)
>> ]]
>> 
>> The LC text is the rule:
>> 
>> [7] predicateObjectList 	
>>     ::= verb objectList (';' predicateObjectList?)*
>> 
>> and I split that into two pieces so that the (...)* is a separate
>> rule and can be left (or right) recursive.  So I think the LC
>> grammar is sufficient.  Efficient LALR grammars are known to be hard
>> to read for most people because of the counter-intuitive reduction
>> from right [1].
>> 
>> ---------------------
>> predicateObjectList
>>    : predicate objectList predicateObjectList2
>>    ;
>> 
>> predicateObjectList2
>>    :
>>    | predicateObjectList2 SEMICOLON
>>    | predicateObjectList2 SEMICOLON predicate objectList
>>    ;
>> ---------------------
>> 
>> It's not necessary to split the rule because directly as per LC [7]
>> because this seems to work:
>> 
>> ---------------------
>> predicateObjectList
>>    : predicate objectList
>>    | predicate objectList  semi
>>    | predicate objectList  semi  predicateObjectList
>>    ;
>> 
>> semi : SEMICOLON | semi SEMICOLON
>> ---------------------
>> but is right-recursive which Eric does not like.  Different writing
>> of the grammar into LALR have different effects for the same
>> language.
>> 
>> 
>> The full mock up is available at
>> 
>> http://people.apache.org/~andy/c-ttl-3.zip
>> 
>> and it has no conflicts.  There are three versions (ttl0.y & ttl1.y
>> & ttl2.y) - the extracts above are ttl1.y and ttl2.y.  ttl1 is
>> harder to understand but possibly more efficient at scale, depending
>> on the details of the LALR generator.
>> 
>> It is not a complete Turtle parser - it skimps on the tokens but
>> does cover all the language features.  It even generates triples
>> (except in collections because getting the rdf:first/rdf:rest right
>> is distracting from the main point), to test the language actually
>> parsed.
>> 
>> It even has the other key LALR feature - only one error message
>> "syntax error" without details.  Any real LALR parser would
>> incorporate additional rules to capture error conditions.
>> 
>> 	Andy
>> 
>> [1]
>> http://en.wikipedia.org/wiki/LALR_parser#Implementation_issues
>> 
>> [2]
>> See also: parser generators that it is claimed accept fairly EBNF
>> directly: I have not tried any:
>> 
>> http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages
>> 
> 
> -- 
> -ericP
> 

Received on Wednesday, 19 December 2012 07:08:17 UTC