rq23 grammar update

In response to comments on the grammar and escapes, I have updated the rq23 
v1.470 grammar section.

The grammar is LL(1), addressing Richard Newman's And Tim Berners-Lee's comments 
on the grammar.


Also addresed is Walid Maalej's comment on variable names and leading digits. 
It does not make variable names full NCNAMEs because that would includes "-" and "."



1/ Triples rule changes : this is the last thing that stopped it being LL(1)

2/ There is an explicit rule for IRIRefOrFunction() in expressions to make it 
clearer about this case (Dave;'s comment)

3/ IRI references are: '<' ([^<>]-[#00-#20])* '>'
    that is, excludes some characters but is not a full IRI gramamr.

There is also text in the grammar section to say that IRI must be valid so no 

4/ Removed rule RDFTerm (again!) which was never used.

5/ Escapes: the grammar itself has rules for handling \t etc in strings but the 
Unicode codepoint escapes (\u and \U) are not included in the grammar because it 
would require enumerating everything twice, once for the plain character, once 
for the \u form/

\u and \U are allowed in varibales names, qnames, strings and IRIs.

(A practical alternative would to allow \u forms, not restrict the codepoint 
space, and have text to cover things like "don't put \u0020 in an IRI").

This grammar has no local lookahead and has been checked for LA requirements 
with JavaCC, it has been fed to yacker (it's grammar "afs1"
except from (3) above the character class difference isn't supported so it is a 
slightly weaker '<' ([^<>])* '>' .  Yacker produces bison, yacc and Perl-based 
parsers with no errors.

This update is so anyone interested can review it.  Some tidying up would be a 
good idea.


Received on Wednesday, 24 August 2005 15:24:54 UTC