Comments on SPARQL (based on an SPARQL Engine implementation in Python)

Dear all,

I had an RDQL implementation on the top of RDFLib that included a number of features 
similar to SPARQL[1]. So I spent some time to turn it into a SPARQL engine. It is not a 
100% complete implementation of SPARQL, it does not include a parser of the query language 
but only the Python engine, and probably has bugs. Nevertheless, it is usable (meaning 
that I use it, for example:-). The description, with further links to the code itself are 
in [2].

There are some problems I met during the implementation that are worth noting here. 
(Although some of these problems are, undoubtedly, due to my misunderstanding of the 
draft's intention...).

--------------

Constraining Values.

The draft refers to the possibility of using application methods/functions as constraints. 
And that is good. However, the question arises: what bound variables the function has 
access to? In my implementation I separated a 'per-pattern' constraint and a 'global' 
constraint. Per pattern constraint functions are invoked for one triplets, and get the 
three (bound) triplet resource references as arguments. 'Global' constraint functions are 
invoked at the end of the full pattern matching process, and they have access to all the 
binding (eg, in the case of Python, in the form of a dictionary of the form {"?x" : 
TheBoundResourceForX,... }.

Clearly, the global constraint can, functionally, replace the per-pattern constraint, but 
using per-pattern constraint may make the implementation way more efficient (essentially, 
it may cut what I called the expansion tree early on in the process). It is also clear 
that a query language parser can separate per-pattern constraints for the kind of examples 
that are in the draft (?a < 10, etc), so it may be enough if the underlying engine offers 
this differentiation. But a parser cannot cover the general user method case. The issue is 
whether we want to make that differentiation or not in SPARQL or not for this reason. In 
any case, this question *must be specified* in the document, imho.

---------------

Nested Patterns. It is not clear in the draft how 'deep' nesting can go. I did only the 
simple one, ie, only a one level depth is managed:

(?a,?b,c), {(?q,?w?,?r),(?s,t,?u)}

It is not clear whether a nesting of the kind:

(?a,?b,c), {(?q,?w?,?r),{(?s,t,?u),(?q,k,?o)}}

is also allowed or not. Actually, if it is, it has to be defined what it really *means*. 
(My initial thoughts are that it means an alternation of 'or'-s and 'and'-s, it means

(?a,?b,c) and (?q,?w?,?r)
or
(?a,?b,c) and (?s,t,?u) and (?q,k,?o)

etc, recursively.) If this is adopted, it has to be described.

-----------------

Optional Patterns. I was not clear to me *why* there might be more than one optional 
patterns, whereas there is only *one* where pattern. Why the asymmetry?

I actually wonder whether it is not better to define the combination of query results in 
general (one could imagine the sum of two queries, being the concatenation of the result 
lists) and let the individual queries having a simpler structure instead. Just a thought.

-----------------

Query patterns. Not surprisingly, this part is a bit vague (and that is *no* critique on 
the editors, it is just the natural status of things). My understanding of the CONSTRUCT * 
is based on an old note of Guha & al[3] (thanks to DanBri who drew my attention on it). Is 
this the right interpretation?

I found it a bit difficult to mentally bind the CONSTRUCT stuff with the rest of the 
document. It stands a bit separate from the rest. My abstraction in Python was, instead, 
that a query (select, where, optional, etc), returns in fact a query result object, and 
then select, construct, etc, are just methods on that Object. I wonder whether a similar 
notion may not work better when describing the intentions.

-----------------

Finally, the missing bits. Both in SPARQL[1] and in the requirement document[4] I was 
desperately looking for a way to manage collections and containers. SPARQL does *not* give 
me a way to ask whether '?x' is part of the collection 'C', or of the Seq 'S'. For 
handling any of these cases one has either to introduce some form of non-finite query or 
make some special forms for these, specifically. But none of this is documented. For 
example, if I want to use SPARQL to query into RDF graph of a specific OWL ontology, and I 
want to find out whether a specific class 'C' is part of the 'unionOf' describing the 
class 'D', I hit this problem....

-----------------

I hope these remarks are helpful

Ivan

P.S. Disclaimer: though I am part of the W3C Team, this SPARQL implementation has not been 
done as part of a 'formal' W3C project, ie, it does not reflect some sort of a W3C 
opinion! Rather, it is a spin-off of some other things I did using RDF.

[1]http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/
[2]http://www.ivan-herman.net/Python/sparqlDesc.html
[3]http://www.w3.org/TandS/QL/QL98/pp/enabling.html
[4]http://www.w3.org/TR/2004/WD-rdf-dawg-uc-20041012/




-- 

Ivan Herman
W3C Communications Team, Head of Offices
C/o W3C Benelux Office at CWI, Kruislaan 413
1098SJ Amsterdam, The Netherlands
tel: +31-20-5924163; mobile: +31-641044153;
URL: http://www.w3.org/People/all?pictures=yes#ivan

Received on Monday, 25 October 2004 05:09:11 UTC