W3C home > Mailing lists > Public > public-rdf-comments@w3.org > February 2013

Increased lookahead requirements in the Turtle draft

From: David Robillard <d@drobilla.net>
Date: Sun, 17 Feb 2013 17:43:53 -0500
Message-ID: <1361141033.16176.30.camel@verne.drobilla.net>
To: public-rdf-comments@w3.org

I recently got a bug report from a user who's encountered dots in
prefixed names in "Turtle" found in the wild which my parser does not
yet support.  So, I looked at the draft towards implementing this.

Unfortunately it looks like a can of worms for a simple
recursive-descent parser.  The previous specification could be
implemented with 1 character of lookahead, but I don't think this one

Since a PrefixedName can contain a dot, while reading a PrefixedName if
the next character is a dot, it is ambiguous whether or not the dot is
part of the PrefixedName or the end of a statement.  To determine this,
you need to check whether or not the next-next character is a valid
PrefixedName character, and until this is known, neither the dot nor the
next character can be 'eaten'.

The significance is that *1* character of "lookahead" isn't really
lookahead, you just need a peek().  Anything greater requires some kind
of real lookahead implementation, or at least some crafty case-specific
kludges to get around it.

This is not necessarily a spec problem, and two character lookahead is
not an onerous requirement in general, but compared to 1 it is.  I just
thought it was worth mentioning that there is a considerable new
implementation requirement here.  I will have to pay a price in
throughput for this as well.

It's clear, though, that dots in prefixed names are desirable.  Ideally,
tokens, including the delimeters (i.e. '.' and ';'), would be whitespace
delimited, so reading a PrefixedName would simply stop when whitespace
is encountered and this problem would not exist.  Perhaps not realistic
given existing practice, but it would certainly be nice.



Received on Sunday, 17 February 2013 22:44:21 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:59:31 UTC