- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Fri, 27 Oct 2006 14:40:40 +0100
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Forwarded to the working group list from the comments list as related to our current discussions. Andy -------- Original Message -------- Subject: SPARQL OPTIONAL vs UNION treatment of sub-graph matching Resent-Date: Fri, 27 Oct 2006 10:53:41 +0000 Resent-From: public-rdf-dawg-comments@w3.org Date: Thu, 26 Oct 2006 19:23:43 -0400 From: Patrick Shironoshita <Patrick.Shironoshita@infotechsoft.com> To: <public-rdf-dawg-comments@w3.org> CC: 'Mansur Kabuka' <kabuka@infotechsoft.com> As part of our work in implementing ontology-based querying, we have developed an algebra for SPARQL which we have recently submitted for publication. During the development of this query algebra, we have found that the treatment of graph matching in OPTIONAL and UNION graph patterns is, in our opinion, inconsistent, in particular with respect of the issue of sub-graph matching - that is, in the issue of whether solutions to a graph pattern can be sub-graphs of other solutions. Consider, for example, the following OPTIONAL query (from the SPARQL spec): Data: @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .. _:a rdf:type foaf:Person . _:a foaf:name "Alice" . _:a foaf:mbox <mailto:alice@example.com> . _:a foaf:mbox <mailto:alice@work.example> . _:b rdf:type foaf:Person . _:b foaf:name "Bob" . OPTIONAL Query: PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . OPTIONAL { ?x foaf:mbox ?mbox } } OPTIONAL query Result: name mbox "Alice" <mailto:alice@example.com> "Alice" <mailto:alice@work.example> "Bob" (unbound) Now, consider, over the same data set, the following UNION query: UNION Query: PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { { ?x foaf:name ?name } UNION { ?x foaf:name ?name . ?x foaf:mbox ?mbox } } UNION Query Result, as per (our understanding of the) current SPARQL working draft: name mbox "Alice" (unbound) "Bob" (unbound) "Alice" <mailto:alice@example.com> "Alice" <mailto:alice@work.example> where the first two rows match the first part of the UNION pattern, and the second two rows match the second part. As can be seen, the results from the OPTIONAL and UNION queries are different only in that the UNION query allows a sub-graph of another solution, while OPTIONAL explicitly does not. While there is no requirement in SPARQL that the two queries presented above produce the same results, we argue that the implementation of query processors and optimizers for SPARQL would be made simpler if either OPTIONAL or UNION is redefined so that both the queries above yield the same result - and, therefore, so that OPTIONAL can be defined in terms of UNION as follows: P1 OPTIONAL P2 = P1 UNION { P1 . P2} where both P1 and P2 are graph patterns, and where P1 . P2 is the conjunction of P1 and P2. In our opinion, the better solution would be to require that UNION graph patterns state the requirement that sub-graphs of other solutions for the graph pattern should be eliminated (much in the same way as OPTIONAL states this requirement). As can be seen from above, the row that matches "Alice" to ?name and leaves ?mbox unbound is at the very least useless, and potentially dangerous (implying, as it does, that there is no mailbox for "Alice"). Alternatively, both UNION and OPTIONAL could keep sub-graphs of other graphs that match the pattern. In this second case, we would strongly suggest that either a solution modifier or a graph pattern modifier be introduced so that sub-graphs get eliminated from the final result at the query writer's option. Patrick Shironoshita Ph: +1(305)670-5111 ext. 11 Fax: +1(305)670-7722 www.infotechsoft.com
Received on Friday, 27 October 2006 13:41:05 UTC