Re: Bnodes in DELETE templates (was: SPARQL Update 1.1 review part1) from Birte Glimm on 2011-02-24 (public-rdf-dawg@w3.org from January to March 2011)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Thu, 24 Feb 2011 21:26:26 +0000
To: Paul Gearon <gearon@ieee.org>
Cc: Axel Polleres <axel.polleres@deri.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <AANLkTi=2xfnGcbW=g0A51b-fccMbrWz-oAA3_cofZJ4V@mail.gmail.com>
On 23 February 2011 17:34, Paul Gearon <gearon@ieee.org> wrote:
> On Wed, Feb 23, 2011 at 11:45 AM, Birte Glimm
> <birte.glimm@comlab.ox.ac.uk> wrote:
>> Hm, so far I had imagined a relatively simple definition, which is, we
>> evaluate the WHERE clause to get a solution sequence. We apply each
>> solution to the template, throwing away any instantiations that are
>> not RDF (e.g., literal in subj position or unbound var). Now let G be
>> the graph to which the delete applies. Each subgraph G' of G such that
>> an instantiated template is *an instance* of G' is then removed from
>> G.
>> That's it. That's how I understood bnodes act as wild-cards. This
>> might be a bit more complicated with named graphs and quads, but still
>> the idea seems quite straightforward.
>
> It may appear that way, particularly on small example data sets, but
> from an implementation point of view it's not straightforward at
> all... at least for me.

I totally agree with that. I just think that a spec should not be like
an instruction handbook for implementors. It should be specified what
the behaviour is, but not how you actually realise that. Informal
implementation advice is useful, but I at least prefer to have
definitions on a more abstract level.

> I'm expecting a DELETE operation to find a concrete set of triples,
> which then allows me to go into the triple storage and remove those
> instances. A wildcard template means that I won't get an specific set.

Yes, but if we want to stick to the resolution than they should be
wildcards. If it turns out to be too hard to implement, we might have
to rethink the resolution. In a way I am happier with bnodes in the
template not matching to anything than a complictated rewriting
definition that you could anyway not use in practise.

> Consequently, I would need to start iterating through the index
> finding those triples that match the template taking the wildcard into
> consideration. If I want to avoid iterating through every triple in my
> system, I can do a query that takes the wildcard into consideration.
> So now I have a new query which is a transformation on the original
> query.

Yes, one would have to execute the query template to get the solution
sequence, but then you have to internally again execute a query for
the instantiated templates to do the bnode matching.

> However, that's just the case for when there is a single triple in the
> template. What about when there are multiple triples, with one or more
> of those triples containing wildcards in the form of blank nodes? The
> transformations become more complex. Multiple blank nodes make it
> significantly more complex. It can all be automated, but it gets
> alarmingly messy under the covers.

Well, if there are bnode co-references, you can't work triple by triples.

>> If you want to imlement it via rewriting that's fine. Instead of
>> finding instances (which requires computing mappings for the bnodes in
>> the template), you can then just delete syntactically equal triples,
>> but I don't see why this has to be defined in the algebra or as some
>> for of rewriting algorithm.
>
> The model is trying to define that set of triples to be removed. It
> seems to me to be awkward to define a solution set (via the WHERE
> clause), and to then describe how that is used to create a template
> that is then applied to triples in a graph. Perhaps the application of
> said template could be described as a filter, though I would not
> expect any implementation to actually implement it that way.

Yes, it could well be possible that there are better ways or that I am
not familiar enough with the details (I haven't even read the update
spec properly), but I just don't like this complicated rewriting at
all. I rather scrap the idea of havning bnodes acting as wildcards.

>> I think that is even dangerous since it
>> assumes that the bnodes that I return as bindings are really those
>> from the graph, but I am in no other place of the spec oblidged to do
>> that.
>
> I'm not sure I follow you here. Are you suggesting that variables (or
> even wildcard matches) that are bound to bnodes in the graph are to be
> the same ones that the template is referring to? If so, then that is a
> desirable outcome, or else it would not be possible to modify data
> relating to them. However, I may have misinterpreted what you have
> said here.

That is about the scoping graph and that queries are (as defined in
the query spec) assumed to be evaluated over a scoping graph that is
graph equivalent to the active graph. Although most often the scoping
graph is the active graph, that's not required. Thus, if your scoping
graph used different bnode names than the actual graph, the rewriting
doesn't give you triples that a syntactically in the actual graph.

Best regards,
Birte

>> If you don't want to implement via rewriting, you can (have to)
>> find mappings for the bnodes in the instantiated template to find the
>> triples that are to be deleted.
>>
>> The only problem I can see is that we have to decide whether to delete
>> "iteratively" (solution by solution), which seems not to be good:
>> E.g.,
>> data:
>> :a :b :c1 .
>> :a :b :c2 .
>> a :d :e .
>> :a :f :c1 .
>> :a :f :c2 .
>> :a :g :e .
>> query
>> DELETE { :a :f ?c . :a :g ?e } WHERE { :a :b ?c . :a :d ?e }
>> evaluating the query pattern gives two solutions
>> ?c/:c1, ?e/:e and ?c/:c2, ?e/:e
>> If now instantiate the template with the first solution and
>> immediately remove that graph equivalent triples, we are left with
>> data':
>> :a :b :c1 .
>> :a :b :c2 .
>> a :d :e .
>> :a :f :c2 .
>> If we now instantiate the templete with the second solution to
>> :a :f :c2 . :a :g :e
>> There is no longer a matching subgraph left. Depending on the order, I
>> could then end up with different graphs, which is not good at all. One
>> would have to first find all matching triples and then remove them,
>> but this is not really related to the bnode issue.
>
> As you say, this is a separate issue. In this case I have no problems
> with it. There are lots of operations which try to delete the same
> triple more than once, or try to construct the same triple multiple
> times. The duplicates are ignored.
>
> Regards,
> Paul Gearon
>



-- 
Dr. Birte Glimm, Room 309
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283520
Received on Thursday, 24 February 2011 21:27:54 UTC