- From: Jeen Broekstra <j.broekstra@tue.nl>
- Date: Fri, 19 Jan 2007 13:24:16 +0100
- To: andy.seaborne@hp.com
- CC: Lee Feigenbaum <feigenbl@us.ibm.com>, dawg mailing list <public-rdf-dawg@w3.org>
Seaborne, Andy wrote: > > > > -------- Original Message -------- >> From: Lee Feigenbaum <> >> Date: 18 January 2007 19:21 >> >> Thanks, Jeen! My results and comments inline below. >> >> Jeen Broekstra wrote on 01/18/2007 10:35:29 AM: >> >> > I have replaced the old SyntaxFull test set with the tests from >> > SyntaxDev. >> > >> > super-manifest: >> > >> > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/manifest-syntax.ttl >> >> I don't have code yet that can read these. >> >> > Syntax test sets (181 tests in total): >> > >> > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql1/ >> > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql2/ >> > http://www.w3.org/2001/sw/DataAccess/tests/data-r2/syntax-sparql3/ >> > >> > None of these tests are currently DAWG-approved, of course. >> > >> > Only syntax-sparql3 contains negative syntax tests. They are marked as >> > such in the manifest, and can also be recognized by the filename, >> > which is prefixed 'syn-bad-'. >> > >> > I would ask everyone with a SPARQL parser to try out these tests and >> > report possible problems. >> >> As I said previously, Glitter throws exceptions during parsing when it >> encounters a function that it does not recognize. That causes a handful >> of tests to fail, which I've tried to highlight here. >> >> syntax-sparql1 - I fail 4 tests, all because of unknown functions >> syntax-sparql2 - I fail 6 tests; 4 are because of unknown functions. >> The other two are: >> >> syntax-esc-04 >> syntax-esc-05 >> >> ...both of which have liberal use of \u escapes. It won't surprise me >> at all if these tests are fine and this is a parser bug that I have. >> >> syntax-sparql3 - I fail these 6 tests; >> >> syn-bad-{8,9,10,1112,13} - these (negative) test asserts that multiple >> periods ('.') in a row should fail; my parser allows them. >> >> Was there a change at some point in the grammar that affected the >> validity of extra periods in a row? > > No - not in any published version. It's never been intentionally > legal. One development version did get it wrong but that was a long > time ago. > >> >> > I have also ran the set through Sesame's SPARQL parser of course. I >> > get a number of errors (24) and failures (18), most of which have to >> > do with our implementation (we currently do not yet support functions >> > and ordering, and the parser throws exceptions on queries containing >> > those features). >> > >> > I also came across this interesting failure. The following parser test >> > (syntax-sparql1/syntax-forms-02) fails: >> > >> > PREFIX : <http://example.org/ns#> >> > SELECT * WHERE { ( [] [] ) } >> > >> > To be honest I have no idea how to read this query, I would appreciate >> > insights. >> >> Looks like an RDF collection with two blank node elements to me. Let's >> see how it works in the grammar... >> >> It matches [39] Collection first; then two [40] GraphNode productions. >> Each of those matches a [41] VarOrTerm which matches [44] GraphTerm >> which matches [65] BlankNode which matches [84] ANON which consumes '[' >> WS* ']' . >> >> >> > Also: a fair number of the errors Sesame's parser throws have to do >> > with the queries in the syntax-sparql2 set, which use relative URIs in >> > queries (e.g. <a>, <b>, <p1>, etc.). A relative URI has to be resolved >> > against a base URI - which is normally provided using a BASE clause. >> > However, the queries in this test set do not have such a clause. Andy >> > has pointed out to me that according to RFC3986 (URI) in such cases >> > the base URI should be provided by the 'embedding entity', i.e. the >> > location of the file that contains the query. Sesame's query parser >> > has no feature for this however: it only accepts a query string as an >> > argument, a base URI for resolving any relative referencing inside >> > that query can not be provided seperately. I guess that this is a >> > shortcoming in our current parser that we should deal with in Sesame. >> > >> > However, correct resolution in this fashion is a feature of file >> > processing, not query parsing, IMHO, and the test set is designed to >> > test query parsing, not file processing. So I would suggest that we >> > modify these test cases to have a base URI inside the query. This >> > avoids having implementations fail tests on this problem. Thoughts? > > Relative URIs are in the grammar via the production > > [66] Q_IRI_REF ::= '<' ([^<>'{}|^`]-[#x00-#x20])* '>' > > and > http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html#iriRefs > > That does not mean that the parser is required to resolve them at that > point but it seems reasonable to me that a test expect the parser to > accept any legal URI as a syntax test. Yes but my point is that they are not in and of themselves legal URIs. It requires additional information in the form of a base URI (which is not part of the query itself) to make them into legal URIs. > The text in rq25 mentions base URIs and how to treat them: > > http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html#QSynIRI > > and it was the result of discussion and debate in the working group, not > just text some editor put in the doc. > > The more useful use of relative URIs is in the FROM clause: > > FROM <data.ttl> > > meaning maybe read from the same directory. Ditto GRAPH <data.ttl>. > This is used in some tests so that the location of the data file is > adjacent to the query but otherwise independent of location (i.e. the > tests work whereever you unpack them). Yes, but all of this assumes that the query is delivered to the parser through a file. "The same directory" only has meaning in that context. But that is not the only way to deliver a query to the parser, in fact in practice I suspect this will not be a very prevalent way of communicating queries at all. So this is not about query parsing itself but about processing a file which contains a query. In my opinion, those are separate issues. But anyway, it's a minor point and of course you are right that the parser _should_ be able to handle it at the end of the day. I can leave these tests as they are for now. Jeen -- Dr. Jeen Broekstra Den Dolech 2 Information Systems Group HG 7.76 Department of Mathematics and Computer Science P.O. Box 513 Technische Universiteit Eindhoven 5600 MB Eindhoven tel. +31 (0)40 247 36 86 The Netherlands
Received on Friday, 19 January 2007 12:28:20 UTC