Re: predicateObjectList rule requires lookahead from David Robillard on 2011-12-15 (public-rdf-comments@w3.org from December 2011)

From: David Robillard <d@drobilla.net>
Date: Thu, 15 Dec 2011 14:30:11 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-comments@w3.org
Message-ID: <1323977411.1697.66.camel@verne.drobilla.net>

On Thu, 2011-12-15 at 18:44 +0000, Andy Seaborne wrote:
[...]
> You may be able to tokenize on single characters, and build a grammar 
> for the language based on that, but madness may be the result.  It 
> certainly isn't the grammar for the language in the spec.

Actually, except for the few cases mentioned, such a parser is simple,
fast, and quite elegant.  You can just fly through input characters,
always knowing precisely which rule you are working on.  Another way of
putting it would be you can parse the language using only a peekchar()
and readchar() function and a 1 character (rather than 1 token) buffer.
Such languages are luxurious to parse manually, unfortunately Turtle
isn't *quite* such a language.

I sought to implement a high performance dependency-free C
implementation[1], and for better or worse that's how it turned out.
The diversions from the grammar as written are things like:

// Spec: [6] triples ::= subject predicateObjectList
// Impl: [6] triples ::= subject ws+ predicateObjectList

As mentioned in my other reply, though, this stems from having
implemented an earlier revision of the spec which was very ambiguous
with respect to whitespace (it was a trial and error game to figure out
where the magic ws+ and ws* had to go).  I will try to rework it to take
advantage of the new grammar and see what results.

-dr

[1] http://drobilla.net/software/serd/

Received on Sunday, 18 December 2011 12:22:34 UTC