- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Wed, 03 Mar 2010 01:16:28 -0500
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
This email is in reference to http://www.w3.org/2009/sparql/track/actions/201 . It summarizes the discussion of blank nodes in DELETE and concludes with what I see as two reasonable proposals. Skip to the end if all you are interested in are the proposals. == The Problem == The question under consideration is: What are the semantics of blank nodes that appear within the template part of a DELETE statement? == Bakcground == === Blank nodes in query patterns === In SPARQL 1.0 query patterns, a blank node acts as a non-distinguished variable. It binds to graph terms just like variables, but it can't be projected, filtered, sorted, etc. Blank nodes with a given label (e.g. _:b123) can only be used within a single basic graph pattern (BGP) (to do otherwise results in an invalid query string). === Blank nodes in CONSTRUCT templates === Blank nodes in the template of CONSTRUCT queries emit blank nodes into the result RDF. The CONSTRUCT template preserves blank node label coreferences within the RDF generated from a single solution, but emits new blank nodes for each solution to which the template is applied. === Blank nodes in INSERT templates === Blank nodes in INSERT statement templates are as in CONSTRUCT templates. the SPARQL 1.1 Update draft currently says "The template and pattern forms are as defined in SPARQL for construct templates and graph patterns." === Behavior of DELETE === The full form of DELETE is DELETE { template } WHERE { pattern } The semantics are that pattern is matched against the graph store, yielding a solution set (a set of solutions; each solution is a set of variable bindings; each variable binding is a pairing of a variable plus an RDF term). Each solution in the solution set is then inserted into the template in turn, resulting in ground triples. Each of these ground triples is removed from the graph store (either from the graph store's single "unnamed graph", or from the graph specified in the WITH clause, or from the graph specified inline with a GRAPH clause. In particular, for our purposes, note that: DELETE { :a :ground :triple } WHERE { } ...deletes the given triple. The solution set for the empty WHERE clause is one solution with no bindings - that solution gets applied to the template which yields the ground triple which is removed from the unnamed graph. And note that: DELETE { ?unbound :p :o } WHERE { } ...doesn't remove anything at all. ?unbound is (as its name implies) not bound in the single solution; when this solution is applied to the template, we get an invalid triple (because of the unbound variable), and nothing is removed. (The current editor's draft does not spell this out, but this is the analogous behavior to CONSTRUCT templates, and I assume we have consensus around this.) We also have the shortcut form: DELETE WHERE { limited-pattern } limited-pattern can have GRAPH clauses and triple patterns. The draft doesn't yet spell this out, but I believe the current understanding is that this is purely syntactic sugar, as in: DELETE WHERE { X } === DELETE { X } WHERE { X } So, DELETE WHERE { ?s :p :o } is equivalent to DELETE { ?s :p :o } WHERE { ?s :p :o } == The Options for Blank Nodes in Delete Templates == All of which brings us to the topic at hand. What does a blank node mean in a delete template? At the teleconference, we discussed three options. Here they are, with analysis gleaned from the discussion: 1/ Blank nodes are not allowed in DELETE templates. The syntax for DELETE would prohibit blank nodes from appearing in DELETE templates. Most queries involving blank nodes can be written with regular variables instead, and this option avoids the potential confusion of the other options. 2/ Blank nodes are treated as in CONSTRUCT (and INSERT) templates. In this case, a blank node in a DELETE template becomes (in the ground triples) a newly minted blank node for each solution that is applied to the template. Because the blank node is newly minted, it does not occur in the graph store at all. The practical effect of this is that a triple in the DELETE template that contains a blank node would _never_ lead to _anything_ being deleted. It also means that the shortcut form: DELETE WHERE { _:b1 :p :o } ...if treated as pure syntactic sugar for: DELETE { _:b1 :p :o } WHERE { _:b1 :p :o } ...would stand for 2 very different meanings of _:b1, which at best is really confusing. At the teleconference, there did not seem to be any support for this option, and I can't see any useful benefits that it has other than formal consistency with CONSTRUCT and INSERT. 3/ Blank nodes in delete templates are treated similary to blank nodes in query patterns--i.e. as (non-distinguished) variables. The *intent* of this option is that: DELETE { _:b1 :p :o } WHERE { } should delete _all_ triples with predicate :p and object :o. Sandro gave a motivating use case for this interpretation. It would provide the only reasonable way to delete an RDF list: DELETE { ?x :hasList (1 2 3) } WHERE { ... ?x ... } (1 2 3) is syntactic sugar for an expansion involving blank nodes. If those blank nodes are treated as variables, then this would delete all the triples that make up the list. We had trouble writing down the precise meaning of a blank node here. It's *not* just that blank nodes are the same as variables, because: DELETE { _:b1 :p :o } WHERE { } would delete all the <something> :p :o triples whereas DELETE { ?b1 :p :o } WHERE { } would delete nothing (because ?b1 is unbound). Effectively, blank nodes in the template are acting as a way to do both pattern matching, variable binding and triple deleting all in one operation, instead of the normal multi-phase approach. We also immediately noted confusion if the same blank node label was used in the query pattern and the template: DELETE { _:b1 :p :o } WHERE { :foo :bar _:b1 } ...the _:b1 in the WHERE clause acts as a non-distinguished variable whose bindings don't contribute to the solution set, so the _:b1 in the template has to be something different altogether. This is rather bizarre, so we discussed 2 remedies: A) Prohibit the same blank node label from being used in 2 different BGPs, or in 1 BGP and in the template. This is basically the same restriction that SPARQL Query puts on blank nodes. B) Prohibit named blank nodes completely. In the end I don't think we saw much reason to prefer this to approach A. Note that the effect of A (or B) is that the shortcut: DELETE WHERE { _:b1 :p :o } is an illegal query. == The Proposals == I see only two realistic proposals emerging from this. 1/ We prohibit blank nodes in the DELETE template completely. 2/ Blank nodes in DELETE templates act as "wild cards"--effectively variables pre-bound to all RDF terms--to let us write some shortcuts and handle Sandro's case of deleting RDF lists. We prohibit the same blank node label from being used in multiple scopes. == My Opinion == While I'm sympathetic to Sandro's use case, I'm frightened of the fact that: DELETE { _:b1 :p :o } WHERE { } and DELETE { ?b1 :p :o } WHERE { } do dramatically different things. Because of this, I'd rather we go with the first proposal and prohibit blank nodes in the DELETE template entirely. hope this is helpful, Lee
Received on Wednesday, 3 March 2010 06:17:18 UTC