- From: Sean B. Palmer <sean@mysterylights.com>
- Date: Sun, 1 Jun 2003 21:50:08 +0100
- To: "Libby Miller" <Libby.Miller@bristol.ac.uk>, "Andy Seaborne" <Andy_Seaborne@hplb.hpl.hp.com>
- Cc: <www-archive@w3.org>
Hi Libby, Andy, I just wrote a SquishQL parser [1] and hooked it up to my query engine so that I can run queries (e.g. [2]) over the Web. In writing the parser, I came up with a number of comments about SquishQL and RDQL. I'm CCing www-archive instead of www-rdf-rules since I'd like to clear some of these things up before posting some further RDF query ideas I have there. I managed to implement SquishQL very quickly indeed from scratch, and the test cases given on the grammar page were very useful in testing, so that was very encouraging. The SquishQL grammar [3] is, however, kinda hard to follow--and even wrong--in places. The RDQL grammar [4] is similarly afflicted, and worse in some ways since the <> delimited productions aren't further defined anywhere (except in the Jena API, I'm guessing) which makes it rather difficult to implement! SquishQL Bugs: * I've read the list of SquishQL issues at [5], and the grammar page is very out of date with it: for example, it now says that "SELECT *" and "anon nodes" are supported, but these changes are not implemented in the grammar. * The definitions of TextLiteral and Identifier are rather dodgy. For example, the meaning of "letter" seems to be quite different in each one. I interpret TextLiteral as anything following the regexp "'[^'\\]*(?:\\.[^'\\]*)*'", and Identifer as anything following the regexp '[A-Za-z][A-Za-z0-9]*'. * Test 23 shows that URIs can't contain ")". That's kinda been resolved in RDQL by using "<" and ">" to delimit URIs. I think that all terms in SquishQL/RDQL should follow the style set out in NTriples (cf. RDQL bugs below). * The defintions of "integer" and "floating point number" are basically non-existant: I had to guess when implementing them (and just went with '(?:[-+]?[0-9]+)(?:\.[0-9]+)?(?:e[-+]?[0-9]+)?'). * The concept of a QName isn't introduced at all in the grammar. * (minor point) "," in UriList is inconsistently quoted--use apostrophes. * (minor point) The "inverted commas" mentioned on the grammar page are properly called apostrophes... :-) SquishQL testset bugs (this refers to the tests on the SquishQL grammar page): * pm::DeliverableSpec in test 4 is an invalid qname. * "=" is used as string operator in test 11, whereas the grammar specifies it as a number operator only. The SquishQL issues list seems to say that this is an open issue, but nontheless, the test data is inconsistent with the grammar as it stands. Perhaps a note could be added to the grammar? RDQL bugs: * <> productions are not further explained (see above). * "Anon nodes" and QNames are apparently wrapped in "<" and ">" the same as URIs. That seems rather odd: why not use the same definitions as are used for NTriples/N3, i.e. _:bNode <uri> q:name ?univar "literal"? * In my query engine, I return the triples matched as well as bindings. It seems to me that a SELECT "triples" sort of addition to the grammar might be a nice idea, but then perhaps this is out of scope for the sort of things that RDQL was designed to do. Of course, one can always reconstruct the triples matched by feeding the binding results back into the query triples. Then again, I think that the reason that I return triples as well as bindings in queries is that this way you get the bNodes back properly. It seems to me that SquishQL and RDQL (and thus, I presume, their implementations) are not set up well for dealing with bNodes at the moment, which is odd because it's an important issue. * There are some odd small changes from SquishQL that I'm not sure I understand--e.g. the introduction of commas to seperate squishql:ForList/rdql:PrefixDecl (as a compromise, I'd say that these should probably be optional in both languages). Actually, the fact that both SquishQL and RDQL exist signals a bit of a warning to me: I know that RDQL was derived from SquishQL and that all the old code still runs, but haivng two extraordinarily similar implementations of SQL-ish syntaxes for RDF query is rather confusing. It'd be nice, if RDQL is deemed superior, to have more "use RDQL" style notes in the SquishQL stuff, or vice versa. For example, I'm not sure right now whether I should scrap my SquishQL parser and go with an RDQL parser instead or not. Or perhaps I need both? Guidance would be much appreciated! And then there are discussions as to whether SQL-ish syntaxes for RDF query are a good idea at all. Notation3 gets along well with mixing constraints and triples together in formulae, but then it's a different kind of system. Personally, I think that RDQL is a good direction, but obviously the RDF query community needs to a lot of work. I should send a followup to ww-rdf-rules. Overall, my issues with the grammar are fairly minimal, and you've both done a lot of good work on the RDF query front, so thanks! Hopefully a standard syntax (or few) will emerge, and some decent test cases will shortly follow... Cheers, [1] http://infomesh.net/2003/squishql/ Announcement: http://lists.w3.org/Archives/Public/www-rdf-rules/2003Jun/0001 [2] SELECT ?name, ?homepage FROM http://www.w3.org/TR/rdf-syntax-grammar/example07.rdf WHERE (ex:editor http://www.w3.org/TR/rdf-syntax-grammar ?editor) (ex:fullName ?editor ?name) (ex:homePage ?editor ?homepage) USING ex FOR http://example.org/stuff/1.0/ Output: $ ./rdfquery.py squish_test.txt ?name: "Dave Beckett" ?homepage: <http://purl.org/net/dajobe/> [3] http://swordfish.rdfweb.org/rdfquery/squish-bnf.html [4] http://www.hpl.hp.com/semweb/rdql.htm [5] http://ilrt.org/discovery/2001/07/squishql-issues/ -- Sean B. Palmer, <http://purl.org/net/sbp/> "phenomicity by the bucketful" - http://miscoranda.com/
Received on Sunday, 1 June 2003 16:50:15 UTC