Re: syntax tests update from Lee Feigenbaum on 2007-01-18 (public-rdf-dawg@w3.org from January to March 2007)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Thu, 18 Jan 2007 14:21:02 -0500
To: Jeen Broekstra <j.broekstra@tue.nl>, dawg mailing list <public-rdf-dawg@w3.org>
Message-ID: <OFB9E552EC.2DA6478C-ON85257267.0069062B-85257267.006A4C63@us.ibm.com>
Thanks, Jeen! My results and comments inline below.

Jeen Broekstra wrote on 01/18/2007 10:35:29 AM:

> I have replaced the old SyntaxFull test set with the tests from 
SyntaxDev.
> 
> super-manifest:
> 
> http://www.w3.org/2001/sw/DataAccess/tests/data-r2/manifest-syntax.ttl

I don't have code yet that can read these.

> Syntax test sets (181 tests in total):
> 
> http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql1/
> http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql2/
> http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql3/
> 
> None of these tests are currently DAWG-approved, of course.
> 
> Only syntax-sparql3 contains negative syntax tests. They are marked as
> such in the manifest, and can also be recognized by the filename, which
> is prefixed 'syn-bad-'.
> 
> I would ask everyone with a SPARQL parser to try out these tests and
> report possible problems.

As I said previously, Glitter throws exceptions during parsing when it 
encounters a function that it does not recognize. That causes a handful of 
tests to fail, which I've tried to highlight here.

syntax-sparql1 - I fail 4 tests, all because of unknown functions
syntax-sparql2 - I fail 6 tests; 4 are because of unknown functions. The 
other two are:

syntax-esc-04
syntax-esc-05

...both of which have liberal use of \u escapes. It won't surprise me at 
all if these tests are fine and this is a parser bug that I have.

syntax-sparql3 - I fail these 6 tests; 

syn-bad-{8,9,10,1112,13} - these (negative) test asserts that multiple 
periods ('.') in a row should fail; my parser allows them.

Was there a change at some point in the grammar that affected the validity 
of extra periods in a row?

> I have also ran the set through Sesame's SPARQL parser of course. I get
> a number of errors (24) and failures (18), most of which have to do with
> our implementation (we currently do not yet support functions and
> ordering, and the parser throws exceptions on queries containing those
> features).
> 
> I also came across this interesting failure. The following parser test
> (syntax-sparql1/syntax-forms-02) fails:
> 
>  PREFIX : <http://example.org/ns#>
>  SELECT * WHERE { ( [] [] ) }
> 
> To be honest I have no idea how to read this query, I would appreciate
> insights.

Looks like an RDF collection with two blank node elements to me. Let's see 
how it works in the grammar...

It matches [39] Collection first; then two [40] GraphNode productions. 
Each of those matches a [41] VarOrTerm which matches [44] GraphTerm  which 
matches [65] BlankNode which matches [84] ANON which consumes '[' WS* ']' 
.


> Also: a fair number of the errors Sesame's parser throws have to do with
> the queries in the syntax-sparql2 set, which use relative URIs in
> queries (e.g. <a>, <b>, <p1>, etc.). A relative URI has to be resolved
> against a base URI - which is normally provided using a BASE clause.
> However, the queries in this test set do not have such a clause. Andy
> has pointed out to me that according to RFC3986 (URI) in such cases the
> base URI should be provided by the 'embedding entity', i.e. the location
> of the file that contains the query. Sesame's query parser has no
> feature for this however: it only accepts a query string as an argument,
> a base URI for resolving any relative referencing inside that query can
> not be provided seperately. I guess that this is a shortcoming in our
> current parser that we should deal with in Sesame.
> 
> However, correct resolution in this fashion is a feature of file
> processing, not query parsing, IMHO, and the test set is designed to
> test query parsing, not file processing. So I would suggest that we
> modify these test cases to have a base URI inside the query. This avoids
> having implementations fail tests on this problem. Thoughts?

I think this is similar to the undefined functions case, though I'm not 
sure what we should do about it. They're not parse errors per se; they're 
evaluation errors that our engines are catching at parse time. I need to 
think a bit about what I think is the best way to handle these in 
implementation reporting...

> Next up: evaluation tests (which are, after all, the more interesting
> test cases ;)).

yeah! :-)

Lee

> 
> Jeen
> -- 
> Dr. Jeen Broekstra                                          Den Dolech 2
> Information Systems Group                                        HG 7.76
> Department of Mathematics and Computer Science              P.O. Box 513
> Technische Universiteit Eindhoven                      5600 MB Eindhoven
> tel. +31 (0)40 247 36 86                                 The Netherlands
>
Received on Thursday, 18 January 2007 19:21:24 UTC