SPARQL 1.1 grammar updated

Summary::

I've updated the SPARQL grammar snapshot [0] in docs/sparql-grammar-all.html and placed the EBNF into Yacker [2].  The yacker input is checked into SPARQL CVS as well [1].

== Details

A single SPARQL 1.1 parser workflow to cover query and update, starting in javacc (jjdoc, the documentation tool with javacc generates near perfect EBNF - script-cleanable), generating SPARQL document styled HTML and putting EBNF into yacker.  Some of the process is in ARQ SVN at the moment [3] - because it means I can run the syntax checks on the grammar - I can extract the workflow from ARQ and put a copy into W3C space if that helps.

== Grammar comments

1:

There is one grammar hack: the rule for construct template (also used in update for MODIFY, DELETE, INSERT and the data forms).  There is a lookahead of 2 locally.  This is to make the java parser non-recursive.

SPARQL 1.0 rule could be used (it's strict LL(1)) but I left it to remind us to check this out sometime.  The recursion issue is more important for update because of the likely length of requests (e.g. INSERT DATA).  One solution is to write properly and java people will just have to live with the fact that language does not cope with deep recursion properly (it's a tail recursion - Java does not have tail recursion elimination).

2:

I put in a Top() as common entry point for LALR style parsers.  There are two entry points for query and update (QueryUnit and UpdateUnit) which generate unused warnings as they are unused from Top().  I had to rewrite around Top() to make it LL acceptable - the form in 'grammar' is ambiguous to a LL parser as both productions begin with possible BASE.
 
I checked, with javacc, the grammar for LALR-ness as well and it passes.

== Yacker

Uploaded it into yacker as grammar "SPARQL_11" [2] by:
+ Display HTML in browser
+ Cut&paste as text into yacker
+ Fix around ECHAR for \ => \\ (yacker input requires \ escaping in the EBNF)
+ Add @terminals just before IRI_REF

The perl grammar from yacker is OK: two warnings for the alternative entry points.
----
Useless non-terminals:

    UpdateUnit, declared line 1252
    QueryUnit, declared line 588

Useless rules:

    QueryUnit -> Prologue Query
    UpdateUnit -> Prologue Update
----
The C and C++ pass bison with warnings for the unused-from-Top non-terminals.  There are gcc errors in the output from bison - don't understand them (missing declaration).  Python gives a python error.

Running the perl parser: apparently perl does not like:

@_O_QNIL_E_Or_QGT_LPAREN_E_S_QExpression_E_S_QGT_COMMA_E_S_QExpression_E_Star_S_QGT_RPAREN_E_C

and I tried terse names.  Same (web might be caching issues).

Checking SPARQL_Update_FPWD that Simon did, it does not seem to generate a C parser (fails in bison but I can't see why) nor in Python.  Perl has a lot of warnings about "Unused terminals" and "Useless non-terminals" - they look ignorable as due to being update without query.

    Andy

[0] http://www.w3.org/2009/sparql/docs/sparql-grammar-all.html
[1] http://www.w3.org/2009/sparql/docs/sparql-grammar-all.ebnf

[2] http://www.w3.org/2005/01/yacker/uploads/SPARQL_11?lang=perl 
[3] http://jena.svn.sourceforge.net/viewvc/jena/ARQ/trunk/Grammar/

Received on Tuesday, 20 October 2009 10:23:48 UTC