- From: Jeen Broekstra <jeen.broekstra@aduna.biz>
- Date: Fri, 13 Jan 2006 11:46:35 +0100
- To: Dan Connolly <connolly@w3.org>
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Dan Connolly wrote: > This seems like a reasonably coherent argument for a new requirement, > complete with rationale and use case. First of all, I must say that although I am quite in favor of accepting forms of subquerying into the language, and even think that it will be necessary, the timing seems to be bad for this. As a parallel, we have added various forms of subqueries to the SeRQL languages at a later point as well. We did recognize the use cases for it from the very beginning but decided to start out small and work from there. This has proven to be a happy decision for us. As a matter of fact, subqueries in SeRQL were added by an intern who was doing his MSc project with us. It took him about 2-3 months to get it right (but of course, he had little prior experience). Our own estimate for an experienced programmer to add such features would be in the order of 2 person-weeks. Of course, this is only one data point in a single implementation of a query engine and does not take things like query optimization into account. From a language design perspective, I see no great obstacles into allowing subqueries into SPARQL, but it must be recognized that the added implementation burden is significant. To me, the more logical route would be to recognize this as a useful feature and to postpone it, for now. I would also like to point out that there are more forms of subquerying than are sketched in the user's comments (for example, things like ANY and ALL modifiers, or the IN set membership operator) and I feel that if we decide to put this on the critical path, we should take a good look at all of these. Which is another good reason to postpone for now, IMHO. > From: > Fred Zemke <fred.zemke@oracle.com> > > Section 10.3.2 "Accessing graphs in the RDF dataset" > observes that it is possible to extract subgraphs of the > input graphs using elementary CONSTRUCT queries. Once a user does > this, he may presumably direct the output to some storage medium, > assign an IRI to it and then run a query against that extract. > Or with the right operating system interface, he might be able to > "pipe" the output of a CONSTRUCT into the FROM clause of another > SPARQL query. It would be useful to avoid the need for explicitly > storing or > piping the result before performing further queries on it. One way to > do this would be to extend the FROM clause to permit a CONSTRUCT > query as either the default graph or a named graph, for > example SELECT * FROM ( CONSTRUCT ... ) ... > > This is of course analogous to subqueries and in-line views in SQL. The > originators of SQL mistakenly believed that they did not need > subqueries, so subqueries were not part of the original design. > > In the case of SPARQL, perhaps it is true that any query that could be > written with a > CONSTRUCT in the FROM clause could be rewritten to avoid it. > However, experience in SQL and other languages show that it is still a good > idea to permit composability wherever it makes sense semantically, > and leave it to the implementation to find the optimization. Our experience with Sesame/SeRQL indicates that even though not part of the original design, adding it later on was no great burden from a design perspective. I expect that a similar path for SPARQL will not pose grave dangers. While the use case is compelling and I am quite convinced in general that subqueries are useful, perhaps even necessary, I think that at this stage we should restrict ourselves to a simple language to encourage early adoption rather than aiming for an all-singing-all-dancing spec that is significantly harder to write a conforming processor for. [snip] > I also advocate another kind of subquery: allow an ASK as a boolean > expression. This will provide an alternative way to formulate > non-existence queries. For example, the query to find people with no > dc:date in section 11.2.3.1, currently written as: > > PREFIX foaf: <http://xmlns.com/foaf/0.1> > PREFIX dc: <http://purl.org/dc/element/1.1> > SELECT ?name > WHERE { ?x foaf:givenName ?name . > OPTIONAL { ?x dc:date ?date } . > FILTER (!bound(?date)) } > > could be expressed: > > SELECT ?name > WHERE { ?x foaf:givenName ?name . > FILTER ( ! ( ASK ?date WHERE { ?x dc:date ?date } ) ) } > > I think that some might find the formulation using ASK more intuitive. > (I know, some might disagree.) I would like to point out that this is actually equal to having an EXISTS() operator in the language. In SeRQL this would be expressed like so: SELECT name FROM {x} foaf:givenName {name} WHERE NOT EXISTS (SELECT date FROM {x} dc:date {date}) I have previously heard voices against having such functions, on the argument that they require a closed-world-assumption to function. Personally I've never found that argument very compelling (I'm no logician but IMHO the semantics can be easily rewritten to allow a K-like operator - heck, just call it KNOWN instead of EXISTS), but it is the same thing. Jeen -- Jeen Broekstra Aduna BV Knowledge Engineer Julianaplein 14b, 3817 CS Amersfoort http://aduna.biz The Netherlands tel. +31 33 46599877
Received on Friday, 13 January 2006 10:47:22 UTC