- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 15 Dec 2011 18:44:23 +0000
- To: public-rdf-comments@w3.org
On 15/12/11 18:20, Gregg Kellogg wrote: > On Dec 15, 2011, at 9:51 AM, "David Robillard"<d@drobilla.net> wrote: > >> On Thu, 2011-12-15 at 10:08 -0500, Gregg Kellogg wrote: >>> I believe that grammar rule [7] predicateObjectList [1] is not LL(1) and requires look ahead to know what branch to go into. For example: >> >> Turtle has never been LL(1). >> >> You need readahead for BooleanLiteral, since "true" or "false" could >> also be the start of a PrefixedName. Prefix names vs true/false is (usually) done in tokenizering, not the grammar. Different technology. lex / yacc. It might be better to have <TRUE> := "true" but, conventionally, strings constants are assumed to be tokens, the things in <...>. By the time it gets to the "grammar" it's a token that's the <true>, <false>, or <PNAME_NS> or <PNAME_LN>. And it's why 123 is a number and not three numbers. The tokenizer is greedy. > > Using white space to separate tokens where necessary has always been part of Turtle. Assuming this, Turtle (and SPARQL) is LL(1). > > My parser [1] is LL(1). Ditto (javacc, no lookahead tricks), and one also a recursive decent parser. There is a LL(1) grammar for the Turtle language; the spec gives a grammar for the language but it's not the only possible grammar for the language. > > Gregg > > 1: http://github.com/rdf-turtle http://github.com/gkellogg/rdf-turtle > >> This is the worst case, 6 character readahead. Yes, but usually it's a token/grammar split and that's the tokenizer building up it's state before returning a classified tokens. Turtle is quite simple as a set of tokens. You may be able to tokenize on single characters, and build a grammar for the language based on that, but madness may be the result. It certainly isn't the grammar for the language in the spec. Andy >> >> Similarly, >> >> [9] verb ::= predicate | 'a' >> >> Requires a 2 character readahead (to check if the 'a' is followed by >> whitespace since 'a' can start a predicate. >> >> In general, qualified names and keywords are ambiguous while parsing. >> IMO either qualified names should have had quoting ("[foo:bar]", >> perhaps), or the special keywords ("a", "true", "false") should have had >> a unique prefix character, which would solve this problem and make the >> grammar extensible, perhaps even 'dynamically' via a @keyword directive. >> It's too late for that now, however. >> >> I also had to use it in my parser to correctly handle quote characters >> in long string literals, since you can read up to 2 of them and have it >> not terminate the string, i.e. every time you encounter a quote you must >> read ahead 3 characters to determine if this is the end of the string >> literal. I don't see how this could have been avoided, other than >> simply making single quote strings be long literals, but this would have >> meant quotes would always need escaping in a string literal. >> >> -dr >> >> >
Received on Thursday, 15 December 2011 18:44:56 UTC