Re: Report on SPARQL Grammar implementation experience using ANTLR. from Seaborne, Andy on 2005-03-09 (public-rdf-dawg@w3.org from January to March 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 09 Mar 2005 18:17:49 +0000
To: "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>
CC: "'public-rdf-dawg@w3.org'" <public-rdf-dawg@w3.org>, "Bebee, Bradley R." <BRADLEY.R.BEBEE@saic.com>, "Personick, Michael R." <MICHAEL.R.PERSONICK@saic.com>, "'maripuri_sandeep@bah.com'" <maripuri_sandeep@bah.com>
Message-ID: <422F3DCD.8010709@hp.com>

Thompson, Bryan B. wrote:
> Hello,
> 
> I wanted to report on our implementation experience for the SPARQL
> grammar using the ANTLR parser generator based on the Working Draft
> of the SPARQL query language[1][2].

Excellent news!

>  This implementation does not
> handle SPARQL semantics.  As such, the feedback is mainly relevant
> to the utility of the current working draft of the SPARQL grammar
> to implementors.
> 
> Overall, the SPARQL grammar was relatively easy to realize using
> ANTLR.  The main points of confusion were the productions for QNAME
> and QNAME_NS, especially as used in the PrefixDecl production - the
> productions as given could not be made to work without significant
> refactoring.

Hmm - the productions come fairly directly from a javacc grammar but it may be 
that it is distorting the clear expression - what refactoring did you do?  It 
might help me be clearer.  I have just been clearing this area up.

 > Also, the Perl5 regex production was not given in the
 > grammar.

The regex expression changed:
http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JanMar/0227.html

and its easier to parse: it has become
     REGEXP(?x, "foo") or REGEXP(?x, "foo", "i")
So it is no longer unconstrainted, and instead is an expression and some strings.

> 
> The implementation contains a test harness for the parser and another
> for the lexer.  The parser accepts several of the DAWG Test Cases[3],
> but it has not be vetted against all of the current test cases. (The
> lexer test harness is currently non-functional owing to a problem
> which has not yet been isolated either with the ANTLR lexer or with
> how it is being invoked from the test harness.)

I'm developing a bunch of tests of syntax as part of the changes the WG voted on 
yesterday.  The syntax tests are just queries executed against an empty model.

I'm most of the way through changing the grammar to reflect 0227  If the javacc 
master would be of help, it is in the ARQ module of CVS in the Jena repsoitory 
on SourceForge, directory /Grammar.

> 
> It would be useful from an implementation perspective if a bundle was
> developed containing the test cases and sufficient metadata such that
> a harness could easily be written or adapted to run the test cases
> against a given implementation.
> 
> We plan to track subsequent working drafts and update this realization
> of a SPARQL parser and lexer.  As of this writing, the parser produces
> an AST using the default AST generation rules, and hence is not yet
> suitable for query evaluation.  We are considering an RDF algebra as
> a translation target that would make it possible to explore query
> optimization and query evaluation.

That would be cool.  It would be particularly interesting to optimize based on a 
profile of the data and the query asked.

 Andy

> 
> Thanks,
> 
> -bryan
> 
> [1] http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050217
> [2] http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-sparql/
> [3] http://www.w3.org/2001/sw/DataAccess/tests/
>

Received on Wednesday, 9 March 2005 19:22:17 UTC