Re: syntax tests update

-------- Original Message --------
 > From: Lee Feigenbaum <>
 > Date: 18 January 2007 19:21
 >
 > Thanks, Jeen! My results and comments inline below.
 >
 > Jeen Broekstra wrote on 01/18/2007 10:35:29 AM:
 >
 > > I have replaced the old SyntaxFull test set with the tests from
 > > SyntaxDev.
 > >
 > > super-manifest:
 > >
 > > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/manifest-syntax.ttl
 >
 > I don't have code yet that can read these.
 >
 > > Syntax test sets (181 tests in total):
 > >
 > > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql1/
 > > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql2/
 > > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql3/
 > >
 > > None of these tests are currently DAWG-approved, of course.
 > >
 > > Only syntax-sparql3 contains negative syntax tests. They are marked as
 > > such in the manifest, and can also be recognized by the filename,
 > > which is prefixed 'syn-bad-'.
 > >
 > > I would ask everyone with a SPARQL parser to try out these tests and
 > > report possible problems.
 >
 > As I said previously, Glitter throws exceptions during parsing when it
 > encounters a function that it does not recognize. That causes a handful
 > of tests to fail, which I've tried to highlight here.
 >
 > syntax-sparql1 - I fail 4 tests, all because of unknown functions
 > syntax-sparql2 - I fail 6 tests; 4 are because of unknown functions.
 > The other two are:
 >
 > syntax-esc-04
 > syntax-esc-05
 >
 > ...both of which have liberal use of \u escapes. It won't surprise me
 > at all if these tests are fine and this is a parser bug that I have.
 >
 > syntax-sparql3 - I fail these 6 tests;
 >
 > syn-bad-{8,9,10,1112,13} - these (negative) test asserts that multiple
 > periods ('.') in a row should fail; my parser allows them.
 >
 > Was there a change at some point in the grammar that affected the
 > validity of extra periods in a row?

No - not in any published version.  It's never been intentionally legal.  One 
development version did get it wrong but that was a long time ago.

 >
 > > I have also ran the set through Sesame's SPARQL parser of course. I
 > > get a number of errors (24) and failures (18), most of which have to
 > > do with our implementation (we currently do not yet support functions
 > > and ordering, and the parser throws exceptions on queries containing
 > > those features).
 > >
 > > I also came across this interesting failure. The following parser test
 > > (syntax-sparql1/syntax-forms-02) fails:
 > >
 > >  PREFIX : <http://example.org/ns#>
 > >  SELECT * WHERE { ( [] [] ) }
 > >
 > > To be honest I have no idea how to read this query, I would appreciate
 > > insights.
 >
 > Looks like an RDF collection with two blank node elements to me. Let's
 > see how it works in the grammar...
 >
 > It matches [39] Collection first; then two [40] GraphNode productions.
 > Each of those matches a [41] VarOrTerm which matches [44] GraphTerm
 > which matches [65] BlankNode which matches [84] ANON which consumes '['
 > WS* ']' .
 >
 >
 > > Also: a fair number of the errors Sesame's parser throws have to do
 > > with the queries in the syntax-sparql2 set, which use relative URIs in
 > > queries (e.g. <a>, <b>, <p1>, etc.). A relative URI has to be resolved
 > > against a base URI - which is normally provided using a BASE clause.
 > > However, the queries in this test set do not have such a clause. Andy
 > > has pointed out to me that according to RFC3986 (URI) in such cases
 > > the base URI should be provided by the 'embedding entity', i.e. the
 > > location of the file that contains the query. Sesame's query parser
 > > has no feature for this however: it only accepts a query string as an
 > > argument, a base URI for resolving any relative referencing inside
 > > that query can not be provided seperately. I guess that this is a
 > > shortcoming in our current parser that we should deal with in Sesame.
 > >
 > > However, correct resolution in this fashion is a feature of file
 > > processing, not query parsing, IMHO, and the test set is designed to
 > > test query parsing, not file processing. So I would suggest that we
 > > modify these test cases to have a base URI inside the query. This
 > > avoids having implementations fail tests on this problem. Thoughts?

Relative URIs are in the grammar via the production

[66] 	Q_IRI_REF  ::= 	'<' ([^<>'{}|^`]-[#x00-#x20])* '>'

and
http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html#iriRefs

That does not mean that the parser is required to resolve them at that point 
but it seems reasonable to me that a test expect the parser to accept any 
legal URI as a syntax test.

The text in rq25 mentions base URIs and how to treat them:

http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html#QSynIRI

and it was the result of discussion and debate in the working group, not just 
text some editor put in the doc.

The more useful use of relative URIs is in the FROM clause:

FROM <data.ttl>

meaning maybe read from the same directory.  Ditto GRAPH <data.ttl>.  This is 
used in some tests so that the location of the data file is adjacent to the 
query but otherwise independent of location (i.e. the tests work whereever you 
unpack them).

 >
 > I think this is similar to the undefined functions case, though I'm not
 > sure what we should do about it. They're not parse errors per se;
 > they're
 > evaluation errors that our engines are catching at parse time. I need to
 > think a bit about what I think is the best way to handle these in
 > implementation reporting...
 >
 > > Next up: evaluation tests (which are, after all, the more interesting
 > > test cases ;)).
 >
 > yeah! :-)

:-)

	Andy

 >
 > Lee
 >
 > >
 > > Jeen

Received on Friday, 19 January 2007 12:08:22 UTC