[Fwd: SPARQL OPTIONAL vs UNION treatment of sub-graph matching]

Forwarded to the working group list from the comments list as related to our 
current discussions.

	Andy

-------- Original Message --------
Subject: 	SPARQL OPTIONAL vs UNION treatment of sub-graph matching
Resent-Date: 	Fri, 27 Oct 2006 10:53:41 +0000
Resent-From: 	public-rdf-dawg-comments@w3.org
Date: 	Thu, 26 Oct 2006 19:23:43 -0400
From: 	Patrick Shironoshita <Patrick.Shironoshita@infotechsoft.com>
To: 	<public-rdf-dawg-comments@w3.org>
CC: 	'Mansur Kabuka' <kabuka@infotechsoft.com>

As part of our work in implementing ontology-based querying, we have
developed an algebra for SPARQL which we have recently submitted for
publication. During the development of this query algebra, we have found
that the treatment of graph matching in OPTIONAL and UNION graph
patterns is, in our opinion, inconsistent, in particular with respect of
the issue of sub-graph matching - that is, in the issue of whether
solutions to a graph pattern can be sub-graphs of other solutions.

Consider, for example, the following OPTIONAL query (from the SPARQL spec):

Data:

@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
@prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ..

_:a  rdf:type        foaf:Person .
_:a  foaf:name       "Alice" .
_:a  foaf:mbox       <mailto:alice@example.com> .
_:a  foaf:mbox       <mailto:alice@work.example> .

_:b  rdf:type        foaf:Person .
_:b  foaf:name       "Bob" .

OPTIONAL Query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE  { ?x foaf:name  ?name .
          OPTIONAL { ?x  foaf:mbox  ?mbox }
        }

OPTIONAL query Result:
name	mbox
"Alice"	<mailto:alice@example.com>
"Alice"	<mailto:alice@work.example>
"Bob"	(unbound)


Now, consider, over the same data set, the following UNION query:

UNION Query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE  {    { ?x foaf:name  ?name }
             UNION

             { ?x foaf:name  ?name . ?x  foaf:mbox  ?mbox }
        }

UNION Query Result, as per (our understanding of the) current SPARQL
working draft:


name 	mbox
"Alice" 	(unbound)
"Bob" 	(unbound)
"Alice" 	<mailto:alice@example.com>
"Alice" 	<mailto:alice@work.example>

where the first two rows match the first part of the UNION pattern, and
the second two rows match the second part.

As can be seen, the results from the OPTIONAL and UNION queries are
different only in that the UNION query allows a sub-graph of another
solution, while OPTIONAL explicitly does not. While there is no
requirement in SPARQL that the two queries presented above produce the
same results, we argue that the implementation of query processors and
optimizers for SPARQL would be made simpler if either OPTIONAL or UNION
is redefined so that both the queries above yield the same result - and,
therefore, so that OPTIONAL can be defined in terms of UNION as follows:

P1 OPTIONAL P2  = P1 UNION { P1 . P2}

where both P1 and P2 are graph patterns, and where P1 . P2 is the
conjunction of P1 and P2.

In our opinion, the better solution would be to require that UNION graph
patterns state the requirement that sub-graphs of other solutions for
the graph pattern should be eliminated (much in the same way as OPTIONAL
states this requirement). As can be seen from above, the row that
matches "Alice" to ?name and leaves ?mbox unbound is at the very least
useless, and potentially dangerous (implying, as it does, that there is
no mailbox for "Alice").

Alternatively, both UNION and OPTIONAL could keep sub-graphs of other
graphs that match the pattern. In this second case, we would strongly
suggest that either a solution modifier or a graph pattern modifier be
introduced so that sub-graphs get eliminated from the final result at
the query writer's option.

Patrick Shironoshita
Ph: +1(305)670-5111 ext. 11
Fax: +1(305)670-7722
www.infotechsoft.com

Received on Friday, 27 October 2006 13:41:05 UTC