Re: predicateObjectList rule requires lookahead from David Robillard on 2011-12-15 (public-rdf-comments@w3.org from December 2011)

From: David Robillard <d@drobilla.net>
Date: Thu, 15 Dec 2011 12:48:52 -0500
To: Gregg Kellogg <gregg@kellogg-assoc.com>
Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>, Gavin Carothers <gavin@carothers.name>
Message-ID: <1323971332.1697.15.camel@verne.drobilla.net>

On Thu, 2011-12-15 at 10:08 -0500, Gregg Kellogg wrote:
> I believe that grammar rule [7] predicateObjectList [1] is not LL(1) and requires look ahead to know what branch to go into. For example:

Turtle has never been LL(1).

You need readahead for BooleanLiteral, since "true" or "false" could
also be the start of a PrefixedName.

This is the worst case, 6 character readahead.

Similarly,

[9] verb ::= predicate | 'a'

Requires a 2 character readahead (to check if the 'a' is followed by
whitespace since 'a' can start a predicate.

In general, qualified names and keywords are ambiguous while parsing.
IMO either qualified names should have had quoting ("[foo:bar]",
perhaps), or the special keywords ("a", "true", "false") should have had
a unique prefix character, which would solve this problem and make the
grammar extensible, perhaps even 'dynamically' via a @keyword directive.
It's too late for that now, however.

I also had to use it in my parser to correctly handle quote characters
in long string literals, since you can read up to 2 of them and have it
not terminate the string, i.e. every time you encounter a quote you must
read ahead 3 characters to determine if this is the end of the string
literal.  I don't see how this could have been avoided, other than
simply making single quote strings be long literals, but this would have
meant quotes would always need escaping in a string literal.

-dr

Received on Sunday, 18 December 2011 12:22:34 UTC