DELETE and blank nodes from Lee Feigenbaum on 2010-03-03 (public-rdf-dawg@w3.org from January to March 2010)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Wed, 03 Mar 2010 01:16:28 -0500
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4B8DFEBC.2090605@thefigtrees.net>
This email is in reference to 
http://www.w3.org/2009/sparql/track/actions/201 . It summarizes the 
discussion of blank nodes in DELETE and concludes with what I see as two 
reasonable proposals. Skip to the end if all you are interested in are 
the proposals.

== The Problem ==

The question under consideration is: What are the semantics of blank 
nodes that appear within the template part of a DELETE statement?

== Bakcground ==

=== Blank nodes in query patterns ===

In SPARQL 1.0 query patterns, a blank node acts as a non-distinguished 
variable. It binds to graph terms just like variables, but it can't be 
projected, filtered, sorted, etc. Blank nodes with a given label (e.g. 
_:b123) can only be used within a single basic graph pattern (BGP) (to 
do otherwise results in an invalid query string).

=== Blank nodes in CONSTRUCT templates ===

Blank nodes in the template of CONSTRUCT queries emit blank nodes into 
the result RDF. The CONSTRUCT template preserves blank node label 
coreferences within the RDF generated from a single solution, but emits 
new blank nodes for each solution to which the template is applied.

=== Blank nodes in INSERT templates ===

Blank nodes in INSERT statement templates are as in CONSTRUCT templates. 
the SPARQL 1.1 Update draft currently says "The template and pattern 
forms are as defined in SPARQL for construct templates and graph patterns."

=== Behavior of DELETE ===

The full form of DELETE is

   DELETE { template } WHERE { pattern }

The semantics are that pattern is matched against the graph store, 
yielding a solution set (a set of solutions; each solution is a set of 
variable bindings; each variable binding is a pairing of a variable plus 
an RDF term). Each solution in the solution set is then inserted into 
the template in turn, resulting in ground triples. Each of these ground 
triples is removed from the graph store (either from the graph store's 
single "unnamed graph", or from the graph specified in the WITH clause, 
or from the graph specified inline with a GRAPH clause.

In particular, for our purposes, note that:

   DELETE { :a :ground :triple } WHERE { }

...deletes the given triple. The solution set for the empty WHERE clause 
is one solution with no bindings - that solution gets applied to the 
template which yields the ground triple which is removed from the 
unnamed graph.

And note that:

   DELETE { ?unbound :p :o } WHERE { }

...doesn't remove anything at all. ?unbound is (as its name implies) not 
bound in the single solution; when this solution is applied to the 
template, we get an invalid triple (because of the unbound variable), 
and nothing is removed. (The current editor's draft does not spell this 
out, but this is the analogous behavior to CONSTRUCT templates, and I 
assume we have consensus around this.)

We also have the shortcut form:

   DELETE WHERE { limited-pattern }

limited-pattern can have GRAPH clauses and triple patterns. The draft 
doesn't yet spell this out, but I believe the current understanding is 
that this is purely syntactic sugar, as in:

   DELETE WHERE { X } === DELETE { X } WHERE { X }

So, DELETE WHERE { ?s :p :o } is equivalent to

   DELETE { ?s :p :o } WHERE { ?s :p :o }

== The Options for Blank Nodes in Delete Templates ==

All of which brings us to the topic at hand.

What does a blank node mean in a delete template? At the teleconference, 
we discussed three options. Here they are, with analysis gleaned from 
the discussion:

1/ Blank nodes are not allowed in DELETE templates. The syntax for 
DELETE would prohibit blank nodes from appearing in DELETE templates. 
Most queries involving blank nodes can be written with regular variables 
instead, and this option avoids the potential confusion of the other 
options.

2/ Blank nodes are treated as in CONSTRUCT (and INSERT) templates. In 
this case, a blank node in a DELETE template becomes (in the ground 
triples) a newly minted blank node for each solution that is applied to 
the template. Because the blank node is newly minted, it does not occur 
in the graph store at all. The practical effect of this is that a triple 
in the DELETE template that contains a blank node would _never_ lead to 
_anything_ being deleted. It also means that the shortcut form:

   DELETE WHERE { _:b1 :p :o }

...if treated as pure syntactic sugar for:

   DELETE { _:b1 :p :o } WHERE { _:b1 :p :o }

...would stand for 2 very different meanings of _:b1, which at best is 
really confusing. At the teleconference, there did not seem to be any 
support for this option, and I can't see any useful benefits that it has 
other than formal consistency with CONSTRUCT and INSERT.

3/ Blank nodes in delete templates are treated similary to blank nodes 
in query patterns--i.e. as (non-distinguished) variables. The *intent* 
of this option is that:

   DELETE { _:b1 :p :o } WHERE { }

should delete _all_ triples with predicate :p and object :o. Sandro gave 
a motivating use case for this interpretation. It would provide the only 
reasonable way to delete an RDF list:

   DELETE { ?x :hasList (1 2 3) } WHERE { ... ?x ... }

(1 2 3) is syntactic sugar for an expansion involving blank nodes. If 
those blank nodes are treated as variables, then this would delete all 
the triples that make up the list.

We had trouble writing down the precise meaning of a blank node here. 
It's *not* just that blank nodes are the same as variables, because:

   DELETE { _:b1 :p :o } WHERE { }

would delete all the <something> :p :o triples whereas

   DELETE { ?b1 :p :o } WHERE { }

would delete nothing (because ?b1 is unbound). Effectively, blank nodes 
in the template are acting as a way to do both pattern matching, 
variable binding and triple deleting all in one operation, instead of 
the normal multi-phase approach.

We also immediately noted confusion if the same blank node label was 
used in the query pattern and the template:

   DELETE { _:b1 :p :o } WHERE { :foo :bar _:b1 }

...the _:b1 in the WHERE clause acts as a non-distinguished variable 
whose bindings don't contribute to the solution set, so the _:b1 in the 
template has to be something different altogether. This is rather 
bizarre, so we discussed 2 remedies:

   A) Prohibit the same blank node label from being used in 2 different 
BGPs, or in 1 BGP and in the template. This is basically the same 
restriction that SPARQL Query puts on blank nodes.

   B) Prohibit named blank nodes completely. In the end I don't think we 
saw much reason to prefer this to approach A.

Note that the effect of A (or B) is that the shortcut:

   DELETE WHERE { _:b1 :p :o }

is an illegal query.

== The Proposals ==

I see only two realistic proposals emerging from this.

1/ We prohibit blank nodes in the DELETE template completely.

2/ Blank nodes in DELETE templates act as "wild cards"--effectively 
variables pre-bound to all RDF terms--to let us write some shortcuts and 
handle Sandro's case of deleting RDF lists. We prohibit the same blank 
node label from being used in multiple scopes.



== My Opinion ==

While I'm sympathetic to Sandro's use case, I'm frightened of the fact that:

   DELETE { _:b1 :p :o } WHERE { }
and
   DELETE { ?b1 :p :o } WHERE { }

do dramatically different things. Because of this, I'd rather we go with 
the first proposal and prohibit blank nodes in the DELETE template entirely.

hope this is helpful,
Lee
Received on Wednesday, 3 March 2010 06:17:18 UTC