Re: Bnodes in DELETE templates (was: SPARQL Update 1.1 review part1) from Paul Gearon on 2011-02-23 (public-rdf-dawg@w3.org from January to March 2011)

From: Paul Gearon <gearon@ieee.org>
Date: Wed, 23 Feb 2011 12:34:28 -0500
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Cc: Axel Polleres <axel.polleres@deri.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <AANLkTimciuWjGhoPiGLNaJRa0=-X3D7vKnVx6og6=Usr@mail.gmail.com>
On Wed, Feb 23, 2011 at 11:45 AM, Birte Glimm
<birte.glimm@comlab.ox.ac.uk> wrote:
> Hm, so far I had imagined a relatively simple definition, which is, we
> evaluate the WHERE clause to get a solution sequence. We apply each
> solution to the template, throwing away any instantiations that are
> not RDF (e.g., literal in subj position or unbound var). Now let G be
> the graph to which the delete applies. Each subgraph G' of G such that
> an instantiated template is *an instance* of G' is then removed from
> G.
> That's it. That's how I understood bnodes act as wild-cards. This
> might be a bit more complicated with named graphs and quads, but still
> the idea seems quite straightforward.

It may appear that way, particularly on small example data sets, but
from an implementation point of view it's not straightforward at
all... at least for me.

I'm expecting a DELETE operation to find a concrete set of triples,
which then allows me to go into the triple storage and remove those
instances. A wildcard template means that I won't get an specific set.
Consequently, I would need to start iterating through the index
finding those triples that match the template taking the wildcard into
consideration. If I want to avoid iterating through every triple in my
system, I can do a query that takes the wildcard into consideration.
So now I have a new query which is a transformation on the original
query.

However, that's just the case for when there is a single triple in the
template. What about when there are multiple triples, with one or more
of those triples containing wildcards in the form of blank nodes? The
transformations become more complex. Multiple blank nodes make it
significantly more complex. It can all be automated, but it gets
alarmingly messy under the covers.

> If you want to imlement it via rewriting that's fine. Instead of
> finding instances (which requires computing mappings for the bnodes in
> the template), you can then just delete syntactically equal triples,
> but I don't see why this has to be defined in the algebra or as some
> for of rewriting algorithm.

The model is trying to define that set of triples to be removed. It
seems to me to be awkward to define a solution set (via the WHERE
clause), and to then describe how that is used to create a template
that is then applied to triples in a graph. Perhaps the application of
said template could be described as a filter, though I would not
expect any implementation to actually implement it that way.

> I think that is even dangerous since it
> assumes that the bnodes that I return as bindings are really those
> from the graph, but I am in no other place of the spec oblidged to do
> that.

I'm not sure I follow you here. Are you suggesting that variables (or
even wildcard matches) that are bound to bnodes in the graph are to be
the same ones that the template is referring to? If so, then that is a
desirable outcome, or else it would not be possible to modify data
relating to them. However, I may have misinterpreted what you have
said here.

> If you don't want to implement via rewriting, you can (have to)
> find mappings for the bnodes in the instantiated template to find the
> triples that are to be deleted.
>
> The only problem I can see is that we have to decide whether to delete
> "iteratively" (solution by solution), which seems not to be good:
> E.g.,
> data:
> :a :b :c1 .
> :a :b :c2 .
> a :d :e .
> :a :f :c1 .
> :a :f :c2 .
> :a :g :e .
> query
> DELETE { :a :f ?c . :a :g ?e } WHERE { :a :b ?c . :a :d ?e }
> evaluating the query pattern gives two solutions
> ?c/:c1, ?e/:e and ?c/:c2, ?e/:e
> If now instantiate the template with the first solution and
> immediately remove that graph equivalent triples, we are left with
> data':
> :a :b :c1 .
> :a :b :c2 .
> a :d :e .
> :a :f :c2 .
> If we now instantiate the templete with the second solution to
> :a :f :c2 . :a :g :e
> There is no longer a matching subgraph left. Depending on the order, I
> could then end up with different graphs, which is not good at all. One
> would have to first find all matching triples and then remove them,
> but this is not really related to the bnode issue.

As you say, this is a separate issue. In this case I have no problems
with it. There are lots of operations which try to delete the same
triple more than once, or try to construct the same triple multiple
times. The duplicates are ignored.

Regards,
Paul Gearon
Received on Wednesday, 23 February 2011 17:36:01 UTC