Re: predicateObjectList rule requires lookahead from David Robillard on 2011-12-15 (public-rdf-comments@w3.org from December 2011)

From: David Robillard <d@drobilla.net>
Date: Thu, 15 Dec 2011 14:14:28 -0500
To: Gregg Kellogg <gregg@kellogg-assoc.com>
Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>, Gavin Carothers <gavin@carothers.name>
Message-ID: <1323976468.1697.51.camel@verne.drobilla.net>

On Thu, 2011-12-15 at 13:20 -0500, Gregg Kellogg wrote:
> On Dec 15, 2011, at 9:51 AM, "David Robillard" <d@drobilla.net> wrote:
> 
> > On Thu, 2011-12-15 at 10:08 -0500, Gregg Kellogg wrote:
> >> I believe that grammar rule [7] predicateObjectList [1] is not LL(1) and requires look ahead to know what branch to go into. For example:
> > 
> > Turtle has never been LL(1).
> > 
> > You need readahead for BooleanLiteral, since "true" or "false" could
> > also be the start of a PrefixedName.
> 
> Using white space to separate tokens where necessary has always been part of Turtle. Assuming this, Turtle (and SPARQL) is LL(1).

I suppose you mean the parser must read a token at a time, and after
reading an entire token can decide what rule applies.  Fair enough, my
implementation needing readahead in this case does not imply Turtle is
not theoretically LL(1), my mistake.

(Forgive my ignorance of common assumption/convention when using parser
generators, I am assuming my feedback from having written hand-written a
parser that very explicitly and directly maps to the grammar may be
valuable)

My issues admittedly stem from having originally implemented an earlier
version of the spec that, among other things, did not separate terminals
from non-terminal rules, and did not define what a "token" is at all.  I
guess only terminal rules define tokens and do *not* implicitly have
inserted whitespace (whereas non-terminal rules are combinations of
tokens which are inherently separated by whitespace).  I do not see this
defined in any document cited by the spec.

Should it be precisely defined what constitues whitespace between
tokens? There are many more unicode whitespace characters than the ws
rule in the spec.

> My parser [1] is LL(1).

How do you deal with quotes in long string literals without readahead?

-dr

Received on Sunday, 18 December 2011 12:22:35 UTC