Re: rq23 grammar update

On Wed, 2005-08-24 at 16:23 +0100, Seaborne, Andy wrote:
> In response to comments on the grammar and escapes, I have updated the rq23 
> v1.470 grammar section.

Ah... great...

> The grammar is LL(1),

I don't see that stated in the document.

I think it's very valuable to let people know. It's the grammar
excepting the (suggested) TERMINALS that's LL(1), yes?

>  addressing Richard Newman's And Tim Berners-Lee's comments 
> on the grammar.
> Also addresed is Walid Maalej's comment on variable names and leading digits. 
> It does not make variable names full NCNAMEs because that would includes "-" and "."

> Changes:
> 1/ Triples rule changes : this is the last thing that stopped it being LL(1)

When discussing this with yosi, I discovered that


was in the language of the LC grammar. Does this new grammar allow that?
I sorta prefer that it does not, but I think we owe the world a test
case to show that we made the change on purpose.


> 2/ There is an explicit rule for IRIRefOrFunction() in expressions to make it 
> clearer about this case (Dave;'s comment)
> 3/ IRI references are: '<' ([^<>]-[#00-#20])* '>'
>     that is, excludes some characters but is not a full IRI gramamr.

Very well.

> There is also text in the grammar section to say that IRI must be valid so no 
> <a###b>.

Hmm... "Any IRI references in a SPARQL query string must valid according
to RFC 3987 [RFC3987] and RFC 3986 [RFC3986]." So <a##b> makes it
not a SPARQL query string, rather than saying that it _is_ a sparql
query string with an error and hence this spec doesn't define anything
else about it, like its abstract form or what the corresponding results

I wonder what the protocol implications of that are. I think it means
servers have to check the spelling of URIs and must not return a 200 OK
in this case. Does anybody currently do that?

> 4/ Removed rule RDFTerm (again!) which was never used.
> 5/ Escapes: the grammar itself has rules for handling \t etc in strings but the 
> Unicode codepoint escapes (\u and \U) are not included in the grammar because it 
> would require enumerating everything twice, once for the plain character, once 
> for the \u form/
> \u and \U are allowed in varibales names, qnames, strings and IRIs.

Hmm... that seems to say that we're using a notation that's very similar
to the XML 1.1 grammar notation, but with a few tweaks. The sections
on comments, keywords, whitespace and escapes are grammar
notation tweaks.

> (A practical alternative would to allow \u forms, not restrict the codepoint 
> space, and have text to cover things like "don't put \u0020 in an IRI").
> This grammar has no local lookahead and has been checked for LA requirements 
> with JavaCC, it has been fed to yacker (it's grammar "afs1"
> except from (3) above the character class difference isn't supported so it is a 
> slightly weaker '<' ([^<>])* '>' .  Yacker produces bison, yacc and Perl-based 
> parsers with no errors.

Please let's share that info with the world. Let's publish those bison,
yacc, and perl-based parsers as non-normative linked files. And
turtle, if it's not much trouble.

Dan Connolly, W3C
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Monday, 29 August 2005 13:35:47 UTC