W3C home > Mailing lists > Public > www-archive@w3.org > June 2003

SquishQL/RDQL Comments

From: Sean B. Palmer <sean@mysterylights.com>
Date: Sun, 1 Jun 2003 21:50:08 +0100
Message-ID: <00c901c3287f$659ab060$63540150@localhost>
To: "Libby Miller" <Libby.Miller@bristol.ac.uk>, "Andy Seaborne" <Andy_Seaborne@hplb.hpl.hp.com>
Cc: <www-archive@w3.org>

Hi Libby, Andy,

I just wrote a SquishQL parser [1] and hooked it up to my query engine
so that I can run queries (e.g. [2]) over the Web. In writing the
parser, I came up with a number of comments about SquishQL and RDQL.
I'm CCing www-archive instead of www-rdf-rules since I'd like to clear
some of these things up before posting some further RDF query ideas I
have there.

I managed to implement SquishQL very quickly indeed from scratch, and
the test cases given on the grammar page were very useful in testing,
so that was very encouraging.

The SquishQL grammar [3] is, however, kinda hard to follow--and even
wrong--in places. The RDQL grammar [4] is similarly afflicted, and
worse in some ways since the <> delimited productions aren't further
defined anywhere (except in the Jena API, I'm guessing) which makes it
rather difficult to implement!

SquishQL Bugs:
* I've read the list of SquishQL issues at [5], and the grammar page
is very out of date with it: for example, it now says that "SELECT *"
and "anon nodes" are supported, but these changes are not implemented
in the grammar.
* The definitions of TextLiteral and Identifier are rather dodgy. For
example, the meaning of "letter" seems to be quite different in each
one. I interpret TextLiteral as anything following the regexp
"'[^'\\]*(?:\\.[^'\\]*)*'", and Identifer as anything following the
regexp '[A-Za-z][A-Za-z0-9]*'.
* Test 23 shows that URIs can't contain ")". That's kinda been
resolved in RDQL by using "<" and ">" to delimit URIs. I think that
all terms in SquishQL/RDQL should follow the style set out in NTriples
(cf. RDQL bugs below).
* The defintions of "integer" and "floating point number" are
basically non-existant: I had to guess when implementing them (and
just went with '(?:[-+]?[0-9]+)(?:\.[0-9]+)?(?:e[-+]?[0-9]+)?').
* The concept of a QName isn't introduced at all in the grammar.
* (minor point) "," in UriList is inconsistently quoted--use
apostrophes.
* (minor point) The "inverted commas" mentioned on the grammar page
are properly called apostrophes... :-)

SquishQL testset bugs (this refers to the tests on the SquishQL
grammar page):
* pm::DeliverableSpec in test 4 is an invalid qname.
* "=" is used as string operator in test 11, whereas the grammar
specifies it as a number operator only. The SquishQL issues list seems
to say that this is an open issue, but nontheless, the test data is
inconsistent with the grammar as it stands. Perhaps a note could be
added to the grammar?

RDQL bugs:
* <> productions are not further explained (see above).
* "Anon nodes" and QNames are apparently wrapped in "<" and ">" the
same as URIs. That seems rather odd: why not use the same definitions
as are used for NTriples/N3, i.e. _:bNode <uri> q:name ?univar
"literal"?
* In my query engine, I return the triples matched as well as
bindings. It seems to me that a SELECT "triples" sort of addition to
the grammar might be a nice idea, but then perhaps this is out of
scope for the sort of things that RDQL was designed to do. Of course,
one can always reconstruct the triples matched by feeding the binding
results back into the query triples. Then again, I think that the
reason that I return triples as well as bindings in queries is that
this way you get the bNodes back properly. It seems to me that
SquishQL and RDQL (and thus, I presume, their implementations) are not
set up well for dealing with bNodes at the moment, which is odd
because it's an important issue.
* There are some odd small changes from SquishQL that I'm not sure I
understand--e.g. the introduction of commas to seperate
squishql:ForList/rdql:PrefixDecl (as a compromise, I'd say that these
should probably be optional in both languages).

Actually, the fact that both SquishQL and RDQL exist signals a bit of
a warning to me: I know that RDQL was derived from SquishQL and that
all the old code still runs, but haivng two extraordinarily similar
implementations of SQL-ish syntaxes for RDF query is rather confusing.
It'd be nice, if RDQL is deemed superior, to have more "use RDQL"
style notes in the SquishQL stuff, or vice versa. For example, I'm not
sure right now whether I should scrap my SquishQL parser and go with
an RDQL parser instead or not. Or perhaps I need both? Guidance would
be much appreciated!

And then there are discussions as to whether SQL-ish syntaxes for RDF
query are a good idea at all. Notation3 gets along well with mixing
constraints and triples together in formulae, but then it's a
different kind of system. Personally, I think that RDQL is a good
direction, but obviously the RDF query community needs to a lot of
work. I should send a followup to ww-rdf-rules.

Overall, my issues with the grammar are fairly minimal, and you've
both done a lot of good work on the RDF query front, so thanks!
Hopefully a standard syntax (or few) will emerge, and some decent test
cases will shortly follow...

Cheers,

[1] http://infomesh.net/2003/squishql/
Announcement:
http://lists.w3.org/Archives/Public/www-rdf-rules/2003Jun/0001
[2] SELECT ?name, ?homepage FROM
   http://www.w3.org/TR/rdf-syntax-grammar/example07.rdf
WHERE
   (ex:editor http://www.w3.org/TR/rdf-syntax-grammar ?editor)
   (ex:fullName ?editor ?name)
   (ex:homePage ?editor ?homepage)
USING
   ex FOR http://example.org/stuff/1.0/

Output:
$ ./rdfquery.py squish_test.txt
?name: "Dave Beckett"
?homepage: <http://purl.org/net/dajobe/>

[3] http://swordfish.rdfweb.org/rdfquery/squish-bnf.html
[4] http://www.hpl.hp.com/semweb/rdql.htm
[5] http://ilrt.org/discovery/2001/07/squishql-issues/

--
Sean B. Palmer, <http://purl.org/net/sbp/>
"phenomicity by the bucketful" - http://miscoranda.com/
Received on Sunday, 1 June 2003 16:50:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:17:31 GMT