- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 15 Dec 2011 18:44:23 +0000
- To: public-rdf-comments@w3.org
On 15/12/11 18:20, Gregg Kellogg wrote:
> On Dec 15, 2011, at 9:51 AM, "David Robillard"<d@drobilla.net> wrote:
>
>> On Thu, 2011-12-15 at 10:08 -0500, Gregg Kellogg wrote:
>>> I believe that grammar rule [7] predicateObjectList [1] is not LL(1) and requires look ahead to know what branch to go into. For example:
>>
>> Turtle has never been LL(1).
>>
>> You need readahead for BooleanLiteral, since "true" or "false" could
>> also be the start of a PrefixedName.
Prefix names vs true/false is (usually) done in tokenizering, not the
grammar. Different technology. lex / yacc.
It might be better to have
<TRUE> := "true"
but, conventionally, strings constants are assumed to be tokens, the
things in <...>.
By the time it gets to the "grammar" it's a token that's the <true>,
<false>, or <PNAME_NS> or <PNAME_LN>.
And it's why 123 is a number and not three numbers. The tokenizer is
greedy.
>
> Using white space to separate tokens where necessary has always been part of Turtle. Assuming this, Turtle (and SPARQL) is LL(1).
>
> My parser [1] is LL(1).
Ditto (javacc, no lookahead tricks), and one also a recursive decent parser.
There is a LL(1) grammar for the Turtle language; the spec gives a
grammar for the language but it's not the only possible grammar for the
language.
>
> Gregg
>
> 1: http://github.com/rdf-turtle
http://github.com/gkellogg/rdf-turtle
>
>> This is the worst case, 6 character readahead.
Yes, but usually it's a token/grammar split and that's the tokenizer
building up it's state before returning a classified tokens. Turtle is
quite simple as a set of tokens.
You may be able to tokenize on single characters, and build a grammar
for the language based on that, but madness may be the result. It
certainly isn't the grammar for the language in the spec.
Andy
>>
>> Similarly,
>>
>> [9] verb ::= predicate | 'a'
>>
>> Requires a 2 character readahead (to check if the 'a' is followed by
>> whitespace since 'a' can start a predicate.
>>
>> In general, qualified names and keywords are ambiguous while parsing.
>> IMO either qualified names should have had quoting ("[foo:bar]",
>> perhaps), or the special keywords ("a", "true", "false") should have had
>> a unique prefix character, which would solve this problem and make the
>> grammar extensible, perhaps even 'dynamically' via a @keyword directive.
>> It's too late for that now, however.
>>
>> I also had to use it in my parser to correctly handle quote characters
>> in long string literals, since you can read up to 2 of them and have it
>> not terminate the string, i.e. every time you encounter a quote you must
>> read ahead 3 characters to determine if this is the end of the string
>> literal. I don't see how this could have been avoided, other than
>> simply making single quote strings be long literals, but this would have
>> meant quotes would always need escaping in a string literal.
>>
>> -dr
>>
>>
>
Received on Thursday, 15 December 2011 18:44:56 UTC