- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Wed, 03 Mar 2010 01:16:28 -0500
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
This email is in reference to
http://www.w3.org/2009/sparql/track/actions/201 . It summarizes the
discussion of blank nodes in DELETE and concludes with what I see as two
reasonable proposals. Skip to the end if all you are interested in are
the proposals.
== The Problem ==
The question under consideration is: What are the semantics of blank
nodes that appear within the template part of a DELETE statement?
== Bakcground ==
=== Blank nodes in query patterns ===
In SPARQL 1.0 query patterns, a blank node acts as a non-distinguished
variable. It binds to graph terms just like variables, but it can't be
projected, filtered, sorted, etc. Blank nodes with a given label (e.g.
_:b123) can only be used within a single basic graph pattern (BGP) (to
do otherwise results in an invalid query string).
=== Blank nodes in CONSTRUCT templates ===
Blank nodes in the template of CONSTRUCT queries emit blank nodes into
the result RDF. The CONSTRUCT template preserves blank node label
coreferences within the RDF generated from a single solution, but emits
new blank nodes for each solution to which the template is applied.
=== Blank nodes in INSERT templates ===
Blank nodes in INSERT statement templates are as in CONSTRUCT templates.
the SPARQL 1.1 Update draft currently says "The template and pattern
forms are as defined in SPARQL for construct templates and graph patterns."
=== Behavior of DELETE ===
The full form of DELETE is
DELETE { template } WHERE { pattern }
The semantics are that pattern is matched against the graph store,
yielding a solution set (a set of solutions; each solution is a set of
variable bindings; each variable binding is a pairing of a variable plus
an RDF term). Each solution in the solution set is then inserted into
the template in turn, resulting in ground triples. Each of these ground
triples is removed from the graph store (either from the graph store's
single "unnamed graph", or from the graph specified in the WITH clause,
or from the graph specified inline with a GRAPH clause.
In particular, for our purposes, note that:
DELETE { :a :ground :triple } WHERE { }
...deletes the given triple. The solution set for the empty WHERE clause
is one solution with no bindings - that solution gets applied to the
template which yields the ground triple which is removed from the
unnamed graph.
And note that:
DELETE { ?unbound :p :o } WHERE { }
...doesn't remove anything at all. ?unbound is (as its name implies) not
bound in the single solution; when this solution is applied to the
template, we get an invalid triple (because of the unbound variable),
and nothing is removed. (The current editor's draft does not spell this
out, but this is the analogous behavior to CONSTRUCT templates, and I
assume we have consensus around this.)
We also have the shortcut form:
DELETE WHERE { limited-pattern }
limited-pattern can have GRAPH clauses and triple patterns. The draft
doesn't yet spell this out, but I believe the current understanding is
that this is purely syntactic sugar, as in:
DELETE WHERE { X } === DELETE { X } WHERE { X }
So, DELETE WHERE { ?s :p :o } is equivalent to
DELETE { ?s :p :o } WHERE { ?s :p :o }
== The Options for Blank Nodes in Delete Templates ==
All of which brings us to the topic at hand.
What does a blank node mean in a delete template? At the teleconference,
we discussed three options. Here they are, with analysis gleaned from
the discussion:
1/ Blank nodes are not allowed in DELETE templates. The syntax for
DELETE would prohibit blank nodes from appearing in DELETE templates.
Most queries involving blank nodes can be written with regular variables
instead, and this option avoids the potential confusion of the other
options.
2/ Blank nodes are treated as in CONSTRUCT (and INSERT) templates. In
this case, a blank node in a DELETE template becomes (in the ground
triples) a newly minted blank node for each solution that is applied to
the template. Because the blank node is newly minted, it does not occur
in the graph store at all. The practical effect of this is that a triple
in the DELETE template that contains a blank node would _never_ lead to
_anything_ being deleted. It also means that the shortcut form:
DELETE WHERE { _:b1 :p :o }
...if treated as pure syntactic sugar for:
DELETE { _:b1 :p :o } WHERE { _:b1 :p :o }
...would stand for 2 very different meanings of _:b1, which at best is
really confusing. At the teleconference, there did not seem to be any
support for this option, and I can't see any useful benefits that it has
other than formal consistency with CONSTRUCT and INSERT.
3/ Blank nodes in delete templates are treated similary to blank nodes
in query patterns--i.e. as (non-distinguished) variables. The *intent*
of this option is that:
DELETE { _:b1 :p :o } WHERE { }
should delete _all_ triples with predicate :p and object :o. Sandro gave
a motivating use case for this interpretation. It would provide the only
reasonable way to delete an RDF list:
DELETE { ?x :hasList (1 2 3) } WHERE { ... ?x ... }
(1 2 3) is syntactic sugar for an expansion involving blank nodes. If
those blank nodes are treated as variables, then this would delete all
the triples that make up the list.
We had trouble writing down the precise meaning of a blank node here.
It's *not* just that blank nodes are the same as variables, because:
DELETE { _:b1 :p :o } WHERE { }
would delete all the <something> :p :o triples whereas
DELETE { ?b1 :p :o } WHERE { }
would delete nothing (because ?b1 is unbound). Effectively, blank nodes
in the template are acting as a way to do both pattern matching,
variable binding and triple deleting all in one operation, instead of
the normal multi-phase approach.
We also immediately noted confusion if the same blank node label was
used in the query pattern and the template:
DELETE { _:b1 :p :o } WHERE { :foo :bar _:b1 }
...the _:b1 in the WHERE clause acts as a non-distinguished variable
whose bindings don't contribute to the solution set, so the _:b1 in the
template has to be something different altogether. This is rather
bizarre, so we discussed 2 remedies:
A) Prohibit the same blank node label from being used in 2 different
BGPs, or in 1 BGP and in the template. This is basically the same
restriction that SPARQL Query puts on blank nodes.
B) Prohibit named blank nodes completely. In the end I don't think we
saw much reason to prefer this to approach A.
Note that the effect of A (or B) is that the shortcut:
DELETE WHERE { _:b1 :p :o }
is an illegal query.
== The Proposals ==
I see only two realistic proposals emerging from this.
1/ We prohibit blank nodes in the DELETE template completely.
2/ Blank nodes in DELETE templates act as "wild cards"--effectively
variables pre-bound to all RDF terms--to let us write some shortcuts and
handle Sandro's case of deleting RDF lists. We prohibit the same blank
node label from being used in multiple scopes.
== My Opinion ==
While I'm sympathetic to Sandro's use case, I'm frightened of the fact that:
DELETE { _:b1 :p :o } WHERE { }
and
DELETE { ?b1 :p :o } WHERE { }
do dramatically different things. Because of this, I'd rather we go with
the first proposal and prohibit blank nodes in the DELETE template entirely.
hope this is helpful,
Lee
Received on Wednesday, 3 March 2010 06:17:18 UTC