Re: predicateObjectList rule requires lookahead

On Dec 15, 2011, at 9:51 AM, "David Robillard" <d@drobilla.net> wrote:

> On Thu, 2011-12-15 at 10:08 -0500, Gregg Kellogg wrote:
>> I believe that grammar rule [7] predicateObjectList [1] is not LL(1) and requires look ahead to know what branch to go into. For example:
> 
> Turtle has never been LL(1).
> 
> You need readahead for BooleanLiteral, since "true" or "false" could
> also be the start of a PrefixedName.

Using white space to separate tokens where necessary has always been part of Turtle. Assuming this, Turtle (and SPARQL) is LL(1).

My parser [1] is LL(1).

Gregg

1: http://github.com/rdf-turtle

> This is the worst case, 6 character readahead.
> 
> Similarly,
> 
> [9] verb ::= predicate | 'a'
> 
> Requires a 2 character readahead (to check if the 'a' is followed by
> whitespace since 'a' can start a predicate.
> 
> In general, qualified names and keywords are ambiguous while parsing.
> IMO either qualified names should have had quoting ("[foo:bar]",
> perhaps), or the special keywords ("a", "true", "false") should have had
> a unique prefix character, which would solve this problem and make the
> grammar extensible, perhaps even 'dynamically' via a @keyword directive.
> It's too late for that now, however.
> 
> I also had to use it in my parser to correctly handle quote characters
> in long string literals, since you can read up to 2 of them and have it
> not terminate the string, i.e. every time you encounter a quote you must
> read ahead 3 characters to determine if this is the end of the string
> literal.  I don't see how this could have been avoided, other than
> simply making single quote strings be long literals, but this would have
> meant quotes would always need escaping in a string literal.
> 
> -dr
> 
> 

Received on Thursday, 15 December 2011 18:21:02 UTC