Re: rq23 grammar update from Dan Connolly on 2005-08-29 (public-rdf-dawg@w3.org from July to September 2005)

From: Dan Connolly <connolly@w3.org>
Date: Mon, 29 Aug 2005 08:35:44 -0500
To: andy.seaborne@hp.com
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <1125322544.16011.247.camel@dirk>
On Wed, 2005-08-24 at 16:23 +0100, Seaborne, Andy wrote:
> In response to comments on the grammar and escapes, I have updated the rq23 
> v1.470 grammar section.

Ah... great...

> The grammar is LL(1),

I don't see that stated in the document.
 http://www.w3.org/2001/sw/DataAccess/rq23/#grammar

I think it's very valuable to let people know. It's the grammar
excepting the (suggested) TERMINALS that's LL(1), yes?

>  addressing Richard Newman's And Tim Berners-Lee's comments 
> on the grammar.
> 
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0055.html
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0067.html
> 
> Also addresed is Walid Maalej's comment on variable names and leading digits. 
> It does not make variable names full NCNAMEs because that would includes "-" and "."
> 
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0038.html


> Changes:
> 
> 1/ Triples rule changes : this is the last thing that stopped it being LL(1)

When discussing this with yosi, I discovered that

	SELECT ?x WHERE {.}

was in the language of the LC grammar. Does this new grammar allow that?
I sorta prefer that it does not, but I think we owe the world a test
case to show that we made the change on purpose.

Volunteers?

> 2/ There is an explicit rule for IRIRefOrFunction() in expressions to make it 
> clearer about this case (Dave;'s comment)
> 
> 3/ IRI references are: '<' ([^<>]-[#00-#20])* '>'
>     that is, excludes some characters but is not a full IRI gramamr.

Very well.

> There is also text in the grammar section to say that IRI must be valid so no 
> <a###b>.

Hmm... "Any IRI references in a SPARQL query string must valid according
to RFC 3987 [RFC3987] and RFC 3986 [RFC3986]." So <a##b> makes it
not a SPARQL query string, rather than saying that it _is_ a sparql
query string with an error and hence this spec doesn't define anything
else about it, like its abstract form or what the corresponding results
are.

I wonder what the protocol implications of that are. I think it means
servers have to check the spelling of URIs and must not return a 200 OK
in this case. Does anybody currently do that?


> 4/ Removed rule RDFTerm (again!) which was never used.
> 
> 5/ Escapes: the grammar itself has rules for handling \t etc in strings but the 
> Unicode codepoint escapes (\u and \U) are not included in the grammar because it 
> would require enumerating everything twice, once for the plain character, once 
> for the \u form/
> 
> \u and \U are allowed in varibales names, qnames, strings and IRIs.

Hmm... that seems to say that we're using a notation that's very similar
to the XML 1.1 grammar notation, but with a few tweaks. The sections
on comments, keywords, whitespace and escapes are grammar
notation tweaks.


> (A practical alternative would to allow \u forms, not restrict the codepoint 
> space, and have text to cover things like "don't put \u0020 in an IRI").
> 
> This grammar has no local lookahead and has been checked for LA requirements 
> with JavaCC, it has been fed to yacker (it's grammar "afs1"
> http://www.w3.org/2005/01/yacker/uploads/afs1/bnf?lang=perl
> except from (3) above the character class difference isn't supported so it is a 
> slightly weaker '<' ([^<>])* '>' .  Yacker produces bison, yacc and Perl-based 
> parsers with no errors.

Please let's share that info with the world. Let's publish those bison,
yacc, and perl-based parsers as non-normative linked files. And
turtle, if it's not much trouble.



-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Monday, 29 August 2005 13:35:47 UTC