RE: Report on SPARQL Grammar implementation experience using ANTL R.


The best answer is probably to look at the ANTLR grammar[1] in the CVS
tree for this effort.  There are @todo markers whenever I found something
that I was not sure was correct, that I thought I had to change, or that
I was not sure how to implement.

I'd be happy to discuss these with you, probably offlist.  It is quite
possible that I have misunderstood the intention of the grammar, which
we could isolate by appropriate test cases.

I didn't realize that the javacc grammar was the master.  In any case,
I wanted to see how useful the SPARQL grammar productions were without
resorting to something "modeled in code".  I thought that an independent
implementation of the grammar might be best in order to help iron out
any issues with its expression in the working draft.




-----Original Message-----
From: Seaborne, Andy
To: Thompson, Bryan B.
Cc: ''; Bebee, Bradley R.; Personick, Michael R.;
Sent: 3/9/2005 1:17 PM
Subject: Re: Report on SPARQL Grammar implementation experience using ANTLR.

Thompson, Bryan B. wrote:
> Hello,
> I wanted to report on our implementation experience for the SPARQL
> grammar using the ANTLR parser generator based on the Working Draft
> of the SPARQL query language[1][2].

Excellent news!

>  This implementation does not
> handle SPARQL semantics.  As such, the feedback is mainly relevant
> to the utility of the current working draft of the SPARQL grammar
> to implementors.
> Overall, the SPARQL grammar was relatively easy to realize using
> ANTLR.  The main points of confusion were the productions for QNAME
> and QNAME_NS, especially as used in the PrefixDecl production - the
> productions as given could not be made to work without significant
> refactoring.

Hmm - the productions come fairly directly from a javacc grammar but it
may be 
that it is distorting the clear expression - what refactoring did you
do?  It 
might help me be clearer.  I have just been clearing this area up.

 > Also, the Perl5 regex production was not given in the
 > grammar.

The regex expression changed:

and its easier to parse: it has become
     REGEXP(?x, "foo") or REGEXP(?x, "foo", "i")
So it is no longer unconstrainted, and instead is an expression and some

> The implementation contains a test harness for the parser and another
> for the lexer.  The parser accepts several of the DAWG Test Cases[3],
> but it has not be vetted against all of the current test cases. (The
> lexer test harness is currently non-functional owing to a problem
> which has not yet been isolated either with the ANTLR lexer or with
> how it is being invoked from the test harness.)

I'm developing a bunch of tests of syntax as part of the changes the WG
voted on 
yesterday.  The syntax tests are just queries executed against an empty

I'm most of the way through changing the grammar to reflect 0227  If the
master would be of help, it is in the ARQ module of CVS in the Jena
on SourceForge, directory /Grammar.

> It would be useful from an implementation perspective if a bundle was
> developed containing the test cases and sufficient metadata such that
> a harness could easily be written or adapted to run the test cases
> against a given implementation.
> We plan to track subsequent working drafts and update this realization
> of a SPARQL parser and lexer.  As of this writing, the parser produces
> an AST using the default AST generation rules, and hence is not yet
> suitable for query evaluation.  We are considering an RDF algebra as
> a translation target that would make it possible to explore query
> optimization and query evaluation.

That would be cool.  It would be particularly interesting to optimize
based on a 
profile of the data and the query asked.


> Thanks,
> -bryan
> [1]
> [2]
> [3]

Received on Wednesday, 9 March 2005 18:41:44 UTC