Re: syntax tests update from Jeen Broekstra on 2007-01-19 (public-rdf-dawg@w3.org from January to March 2007)

From: Jeen Broekstra <j.broekstra@tue.nl>
Date: Fri, 19 Jan 2007 13:24:16 +0100
To: andy.seaborne@hp.com
CC: Lee Feigenbaum <feigenbl@us.ibm.com>, dawg mailing list <public-rdf-dawg@w3.org>
Message-ID: <45B0B870.3060300@tue.nl>
Seaborne, Andy wrote:
> 
> 
> 
> -------- Original Message --------
>> From: Lee Feigenbaum <>
>> Date: 18 January 2007 19:21
>>
>> Thanks, Jeen! My results and comments inline below.
>>
>> Jeen Broekstra wrote on 01/18/2007 10:35:29 AM:
>>
>> > I have replaced the old SyntaxFull test set with the tests from
>> > SyntaxDev.
>> >
>> > super-manifest:
>> >
>> > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/manifest-syntax.ttl
>>
>> I don't have code yet that can read these.
>>
>> > Syntax test sets (181 tests in total):
>> >
>> > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql1/
>> > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql2/
>> > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql3/
>> >
>> > None of these tests are currently DAWG-approved, of course.
>> >
>> > Only syntax-sparql3 contains negative syntax tests. They are marked as
>> > such in the manifest, and can also be recognized by the filename,
>> > which is prefixed 'syn-bad-'.
>> >
>> > I would ask everyone with a SPARQL parser to try out these tests and
>> > report possible problems.
>>
>> As I said previously, Glitter throws exceptions during parsing when it
>> encounters a function that it does not recognize. That causes a handful
>> of tests to fail, which I've tried to highlight here.
>>
>> syntax-sparql1 - I fail 4 tests, all because of unknown functions
>> syntax-sparql2 - I fail 6 tests; 4 are because of unknown functions.
>> The other two are:
>>
>> syntax-esc-04
>> syntax-esc-05
>>
>> ...both of which have liberal use of \u escapes. It won't surprise me
>> at all if these tests are fine and this is a parser bug that I have.
>>
>> syntax-sparql3 - I fail these 6 tests;
>>
>> syn-bad-{8,9,10,1112,13} - these (negative) test asserts that multiple
>> periods ('.') in a row should fail; my parser allows them.
>>
>> Was there a change at some point in the grammar that affected the
>> validity of extra periods in a row?
> 
> No - not in any published version.  It's never been intentionally
> legal.  One development version did get it wrong but that was a long
> time ago.
> 
>>
>> > I have also ran the set through Sesame's SPARQL parser of course. I
>> > get a number of errors (24) and failures (18), most of which have to
>> > do with our implementation (we currently do not yet support functions
>> > and ordering, and the parser throws exceptions on queries containing
>> > those features).
>> >
>> > I also came across this interesting failure. The following parser test
>> > (syntax-sparql1/syntax-forms-02) fails:
>> >
>> >  PREFIX : <http://example.org/ns#>
>> >  SELECT * WHERE { ( [] [] ) }
>> >
>> > To be honest I have no idea how to read this query, I would appreciate
>> > insights.
>>
>> Looks like an RDF collection with two blank node elements to me. Let's
>> see how it works in the grammar...
>>
>> It matches [39] Collection first; then two [40] GraphNode productions.
>> Each of those matches a [41] VarOrTerm which matches [44] GraphTerm
>> which matches [65] BlankNode which matches [84] ANON which consumes '['
>> WS* ']' .
>>
>>
>> > Also: a fair number of the errors Sesame's parser throws have to do
>> > with the queries in the syntax-sparql2 set, which use relative URIs in
>> > queries (e.g. <a>, <b>, <p1>, etc.). A relative URI has to be resolved
>> > against a base URI - which is normally provided using a BASE clause.
>> > However, the queries in this test set do not have such a clause. Andy
>> > has pointed out to me that according to RFC3986 (URI) in such cases
>> > the base URI should be provided by the 'embedding entity', i.e. the
>> > location of the file that contains the query. Sesame's query parser
>> > has no feature for this however: it only accepts a query string as an
>> > argument, a base URI for resolving any relative referencing inside
>> > that query can not be provided seperately. I guess that this is a
>> > shortcoming in our current parser that we should deal with in Sesame.
>> >
>> > However, correct resolution in this fashion is a feature of file
>> > processing, not query parsing, IMHO, and the test set is designed to
>> > test query parsing, not file processing. So I would suggest that we
>> > modify these test cases to have a base URI inside the query. This
>> > avoids having implementations fail tests on this problem. Thoughts?
> 
> Relative URIs are in the grammar via the production
> 
> [66]     Q_IRI_REF  ::=     '<' ([^<>'{}|^`]-[#x00-#x20])* '>'
> 
> and
> http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html#iriRefs
> 
> That does not mean that the parser is required to resolve them at that
> point but it seems reasonable to me that a test expect the parser to
> accept any legal URI as a syntax test.

Yes but my point is  that they are not in and of themselves legal URIs.
It requires additional information in the form of a base URI (which is
not part of the query itself) to make them into legal URIs.

> The text in rq25 mentions base URIs and how to treat them:
> 
> http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html#QSynIRI
> 
> and it was the result of discussion and debate in the working group, not
> just text some editor put in the doc.
> 
> The more useful use of relative URIs is in the FROM clause:
> 
> FROM <data.ttl>
> 
> meaning maybe read from the same directory.  Ditto GRAPH <data.ttl>. 
> This is used in some tests so that the location of the data file is
> adjacent to the query but otherwise independent of location (i.e. the
> tests work whereever you unpack them).

Yes, but all of this assumes that the query is delivered to the parser
through a file. "The same directory" only has meaning in that context.
But that is not the only way to deliver a query to the parser, in fact
in practice I suspect this will not be a very prevalent way of
communicating queries at all. So this is not about query parsing itself
but about processing a file which contains a query. In my opinion, those
are separate issues.

But anyway, it's a minor point and of course you are right that the
parser _should_ be able to handle it at the end of the day. I can leave
these tests as they are for now.

Jeen
-- 
Dr. Jeen Broekstra                                          Den Dolech 2
Information Systems Group                                        HG 7.76
Department of Mathematics and Computer Science              P.O. Box 513
Technische Universiteit Eindhoven                      5600 MB Eindhoven
tel. +31 (0)40 247 36 86                                 The Netherlands
Received on Friday, 19 January 2007 12:28:20 UTC