rq23 grammar update from Seaborne, Andy on 2005-08-24 (public-rdf-dawg@w3.org from July to September 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 24 Aug 2005 16:23:28 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <430C90F0.8030504@hp.com>

In response to comments on the grammar and escapes, I have updated the rq23 
v1.470 grammar section.

The grammar is LL(1), addressing Richard Newman's And Tim Berners-Lee's comments 
on the grammar.

http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0055.html
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0067.html

Also addresed is Walid Maalej's comment on variable names and leading digits. 
It does not make variable names full NCNAMEs because that would includes "-" and "."

http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0038.html

Changes:

1/ Triples rule changes : this is the last thing that stopped it being LL(1)

2/ There is an explicit rule for IRIRefOrFunction() in expressions to make it 
clearer about this case (Dave;'s comment)

3/ IRI references are: '<' ([^<>]-[#00-#20])* '>'
    that is, excludes some characters but is not a full IRI gramamr.

There is also text in the grammar section to say that IRI must be valid so no 
<a###b>.

4/ Removed rule RDFTerm (again!) which was never used.

5/ Escapes: the grammar itself has rules for handling \t etc in strings but the 
Unicode codepoint escapes (\u and \U) are not included in the grammar because it 
would require enumerating everything twice, once for the plain character, once 
for the \u form/

\u and \U are allowed in varibales names, qnames, strings and IRIs.

(A practical alternative would to allow \u forms, not restrict the codepoint 
space, and have text to cover things like "don't put \u0020 in an IRI").

This grammar has no local lookahead and has been checked for LA requirements 
with JavaCC, it has been fed to yacker (it's grammar "afs1"
http://www.w3.org/2005/01/yacker/uploads/afs1/bnf?lang=perl
except from (3) above the character class difference isn't supported so it is a 
slightly weaker '<' ([^<>])* '>' .  Yacker produces bison, yacc and Perl-based 
parsers with no errors.


This update is so anyone interested can review it.  Some tidying up would be a 
good idea.

 Andy

Received on Wednesday, 24 August 2005 15:24:54 UTC