major technical: no subqueries from Fred Zemke on 2006-01-12 (public-rdf-dawg-comments@w3.org from January 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Thu, 12 Jan 2006 13:42:31 -0800
To: public-rdf-dawg-comments@w3.org
Message-ID: <43C6CD47.7000100@oracle.com>

Section 10.3.2 "Accessing graphs in the RDF dataset"
observes that it is possible to extract subgraphs of the
input graphs using elementary CONSTRUCT queries.  Once a user does
this, he may presumably direct the output to some storage medium,
assign an IRI to it and then run a query against that extract.
Or with the right operating system interface, he might be able to
"pipe" the output of a CONSTRUCT into the FROM clause of another
SPARQL query. 
It would be useful to avoid the need for explicitly storing or
piping the result before performing further queries on it.  One way to
do this would be to extend the FROM clause to permit a CONSTRUCT
query as either the default graph or a named graph, for
example SELECT * FROM ( CONSTRUCT ... ) ...

This is of course analogous to subqueries and in-line views in SQL. 
The originators of SQL mistakenly believed that they did not need
subqueries, so subqueries were not part of the original design.

In the case of SPARQL, perhaps it is true that any query that could be
written with a
CONSTRUCT in the FROM clause could be rewritten to avoid it.
However, experience in SQL and other languages show that it is still a good
idea to permit composability wherever it makes sense semantically,
and leave it to the implementation to find the optimization.

One scenario in which users will want a CONSTRUCT nested in a FROM clause
is as follows: a user has access to a vast and time-varying input
graph, containing a lot of data that is not of interest to the user.
The user has learned from experience how to extract the portion relevant
to his interests using a CONSTRUCT.  Then the user wishes to refine
his view of the graph further.  For this purpose, he wants to just
cut-and-paste a CONSTRUCT query that he has debugged into his ad hoc
queries.

I also advocate another kind of subquery: allow an ASK as a boolean
expression.  This will provide an alternative way to formulate
non-existence queries.  For example, the query to find people with no
dc:date in section 11.2.3.1, currently written as:

PREFIX foaf: <http://xmlns.com/foaf/0.1>
PREFIX dc: <http://purl.org/dc/element/1.1>
SELECT ?name
WHERE { ?x foaf:givenName ?name .
        OPTIONAL { ?x dc:date ?date } .
        FILTER (!bound(?date)) }

could be expressed:

SELECT ?name
WHERE { ?x foaf:givenName ?name .
        FILTER ( ! ( ASK ?date WHERE { ?x dc:date ?date } ) ) }

I think that some might find the formulation using ASK more intuitive.
(I know, some might disagree.)
Another argument in favor of nested ASK is that it lends itself to
building queries incrementally, from separately debugged pieces.

Fred Zemke

Received on Thursday, 12 January 2006 21:42:39 UTC