Re: rq23 grammar update from Seaborne, Andy on 2005-08-30 (public-rdf-dawg@w3.org from July to September 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 30 Aug 2005 10:34:08 +0100
To: Dan Connolly <connolly@w3.org>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <43142810.3090802@hp.com>
Dan Connolly wrote:
> On Wed, 2005-08-24 at 16:23 +0100, Seaborne, Andy wrote:
> 
>>In response to comments on the grammar and escapes, I have updated the rq23 
>>v1.470 grammar section.
> 
> 
> Ah... great...
> 
> 
>>The grammar is LL(1),
> 
> 
> I don't see that stated in the document.
>  http://www.w3.org/2001/sw/DataAccess/rq23/#grammar
> 
> I think it's very valuable to let people know. It's the grammar
> excepting the (suggested) TERMINALS that's LL(1), yes?

Yes - (modulo one terminal isn't UPPERCASE but that's on my fix list).

Changed:
QuotedIRIRef => IRI_REF

I've add some text:

"""
The SPARQL grammar is LL(1), when the rules with uppercased names are used as 
terminals.
"""


> 
> 
>> addressing Richard Newman's And Tim Berners-Lee's comments 
>>on the grammar.
>>
>>http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0055.html
>>http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0067.html
>>
>>Also addresed is Walid Maalej's comment on variable names and leading digits. 
>>It does not make variable names full NCNAMEs because that would includes "-" and "."
>>
>>http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0038.html
> 
> 
> 
>>Changes:
>>
>>1/ Triples rule changes : this is the last thing that stopped it being LL(1)
> 
> 
> When discussing this with yosi, I discovered that
> 
>  SELECT ?x WHERE {.}
> 
> was in the language of the LC grammar. Does this new grammar allow that?
> I sorta prefer that it does not, but I think we owe the world a test
> case to show that we made the change on purpose.
> 
> Volunteers?
> 
> 
>>2/ There is an explicit rule for IRIRefOrFunction() in expressions to make it 
>>clearer about this case (Dave;'s comment)
>>
>>3/ IRI references are: '<' ([^<>]-[#00-#20])* '>'
>>    that is, excludes some characters but is not a full IRI gramamr.
> 
> 
> Very well.
> 
> 
>>There is also text in the grammar section to say that IRI must be valid so no 
>><a###b>.
> 
> 
> Hmm... "Any IRI references in a SPARQL query string must valid according
> to RFC 3987 [RFC3987] and RFC 3986 [RFC3986]." So <a##b> makes it
> not a SPARQL query string, rather than saying that it _is_ a sparql
> query string with an error and hence this spec doesn't define anything
> else about it, like its abstract form or what the corresponding results
> are.
> 
> I wonder what the protocol implications of that are. I think it means
> servers have to check the spelling of URIs and must not return a 200 OK
> in this case. Does anybody currently do that?

I resolve all IRIs and do checking over and above syntax parsing.

> 
> 
> 
>>4/ Removed rule RDFTerm (again!) which was never used.
>>
>>5/ Escapes: the grammar itself has rules for handling \t etc in strings but the 
>>Unicode codepoint escapes (\u and \U) are not included in the grammar because it 
>>would require enumerating everything twice, once for the plain character, once 
>>for the \u form/
>>
>>\u and \U are allowed in varibales names, qnames, strings and IRIs.
> 
> 
> Hmm... that seems to say that we're using a notation that's very similar
> to the XML 1.1 grammar notation, but with a few tweaks. The sections
> on comments, keywords, whitespace and escapes are grammar
> notation tweaks.

Yes - the XML 1.1 notation section is referenced.

The tweaks for string espaces "\t" etc are common for programming languages and 
in the grammar itself.

The \u \U aren't included because of the implementation alternatives of before 
or after parsing, depending on whether your parser/topkeizer can cope with more 
than 8 bit chars.  I gather some can't yet.

> 
> 
> 
>>(A practical alternative would to allow \u forms, not restrict the codepoint 
>>space, and have text to cover things like "don't put \u0020 in an IRI").
>>
>>This grammar has no local lookahead and has been checked for LA requirements 
>>with JavaCC, it has been fed to yacker (it's grammar "afs1"
>>http://www.w3.org/2005/01/yacker/uploads/afs1/bnf?lang=perl
>>except from (3) above the character class difference isn't supported so it is a 
>>slightly weaker '<' ([^<>])* '>' .  Yacker produces bison, yacc and Perl-based 
>>parsers with no errors.
> 
> 
> Please let's share that info with the world. Let's publish those bison,
> yacc, and perl-based parsers as non-normative linked files.

+1 to publishing all the material we have.  + the JavaCC file.

I've started a section:
"Grammars and Parsers for SPARQL"
on the
   http://esw.w3.org/topic/SparqlImplementations
page because there are other grammars in the works as well (e.g. Ivan's for rdflib).

 > And turtle, if it's not much trouble.
And an additional +1 to Turtle.

 Andy
Received on Tuesday, 30 August 2005 09:35:15 UTC