Re: Bnodes in DELETE templates (was: SPARQL Update 1.1 review part1) from Birte Glimm on 2011-02-23 (public-rdf-dawg@w3.org from January to March 2011)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Wed, 23 Feb 2011 16:45:51 +0000
To: Axel Polleres <axel.polleres@deri.org>
Cc: Paul Gearon <gearon@ieee.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <AANLkTim+TRdna7A4yNkjsB8VhiGsh4R9i1mnp=WSbYes@mail.gmail.com>
Hm, so far I had imagined a relatively simple definition, which is, we
evaluate the WHERE clause to get a solution sequence. We apply each
solution to the template, throwing away any instantiations that are
not RDF (e.g., literal in subj position or unbound var). Now let G be
the graph to which the delete applies. Each subgraph G' of G such that
an instantiated template is *an instance* of G' is then removed from
G.
That's it. That's how I understood bnodes act as wild-cards. This
might be a bit more complicated with named graphs and quads, but still
the idea seems quite straightforward.

If you want to imlement it via rewriting that's fine. Instead of
finding instances (which requires computing mappings for the bnodes in
the template), you can then just delete syntactically equal triples,
but I don't see why this has to be defined in the algebra or as some
for of rewriting algorithm. I think that is even dangerous since it
assumes that the bnodes that I return as bindings are really those
from the graph, but I am in no other place of the spec oblidged to do
that. If you don't want to implement via rewriting, you can (have to)
find mappings for the bnodes in the instantiated template to find the
triples that are to be deleted.

The only problem I can see is that we have to decide whether to delete
"iteratively" (solution by solution), which seems not to be good:
E.g.,
data:
:a :b :c1 .
:a :b :c2 .
a :d :e .
:a :f :c1 .
:a :f :c2 .
:a :g :e .
query
DELETE { :a :f ?c . :a :g ?e } WHERE { :a :b ?c . :a :d ?e }
evaluating the query pattern gives two solutions
?c/:c1, ?e/:e and ?c/:c2, ?e/:e
If now instantiate the template with the first solution and
immediately remove that graph equivalent triples, we are left with
data':
:a :b :c1 .
:a :b :c2 .
a :d :e .
:a :f :c2 .
If we now instantiate the templete with the second solution to
:a :f :c2 . :a :g :e
There is no longer a matching subgraph left. Depending on the order, I
could then end up with different graphs, which is not good at all. One
would have to first find all matching triples and then remove them,
but this is not really related to the bnode issue.

I can also live with omitting triples with unbound variables, as Steve
suggested, but I am not fond of prescribing this quite ugly rewriting.
If I have to choose a rewriting, then it would be option 1, which to
me seems just a way of implementing instance finding.

I also comment inline on your answer below...

On 22 February 2011 22:26, Axel Polleres <axel.polleres@deri.org> wrote:
> Hi Birte,
> On 22 Feb 2011, at 18:12, Birte Glimm wrote:
>
>> On 22 February 2011 03:53, Paul Gearon <gearon@ieee.org> wrote:
>> [snip]
>>
>> > DELETE
>> >  { _:a :p 12 .
>> >   _:a :q ?o .
>> >  }
>> > WHERE {?s :r ?q OPTIONAL { ?q :s ?o } }
>> >
>> > So when that is applied to the following data:
>> >
>> > :x :p 12 .
>> > :x :r :z .
>> > :x :q 10 .
>> >
>> > The model you've proposed will certainly match the first triple in the
>> > template for removal, but the ?o will be unbound, so what happens to
>> > that part of the template? Does it get ignored, or does it prevent the
>> > entire template from matching? (The latter is more in line with
>> > CONSTRUCT and INSERT, but doesn't really work well with the proposed
>> > formal model).
>>
>> In Lee's original discussion of the bnode issue
>> (http://lists.w3.org/Archives/Public/public-rdf-dawg/2010JanMar/0428.html),
>> he writes:
>> "And note that:
>>    DELETE { ?unbound :p :o } WHERE { }
>> ...doesn't remove anything at all. ?unbound is (as its name implies) not
>> bound in the single solution; when this solution is applied to the
>> template, we get an invalid triple (because of the unbound variable),
>> and nothing is removed. (The current editor's draft does not spell this
>> out, but this is the analogous behavior to CONSTRUCT templates, and I
>> assume we have consensus around this.)"
>>
>> Unless this has changed, also the above example shouldn't remove
>> anything because ?o is unbound.
>
> For the records,
>
> version 1 (called option1 in today's TC) of the semantics wouldn't delete anything, on the triple
>  _:a :q ?o.
> in the query above, but it would delete something on the triple
>  _:a :p 12 .
>
> version 2 (called option1 in today's TC, the one where the DELETE template is copied in the body) though, would match something
>  on both triples in the template, which sounds like something you'd object against, so?

I guess here you mean "called option****2**** in today's TC". And,
yes, I don't like that.

>
> version 3 (called option 3 in today's TC, i.e. treating bnodes the same in DELETE as in CONSTRUCT, which would obviously not DELETE anything,
> since to-be-deleted triples with new bnodes don't appear in GS)

This seems a bit pointless. You could then also just disallow bnodes
in the template.

> BTW, are there other alternatives which we didn't mention so far?

I am ok with the effect of option 1, but ouldn't prescribe this
particular implementation technique as argued above.

Birte

>>
>> However, Lee also wrote
>> "=== Behavior of DELETE ===
>> The full form of DELETE is
>>    DELETE { template } WHERE { pattern }
>> The semantics are that pattern is matched against the graph store,
>> yielding a solution set (a set of solutions; each solution is a set of
>> variable bindings; each variable binding is a pairing of a variable plus
>> an RDF term). Each solution in the solution set is then inserted into
>> the template in turn, resulting in ground triples. Each of these ground
>> triples is removed from the graph store (either from the graph store's
>> single "unnamed graph", or from the graph specified in the WITH clause,
>> or from the graph specified inline with a GRAPH clause."
>>
>> In particular "resulting in ground triples" is just not true if the
>> template has bnodes because the bnodes are not changed by applying the
>> solution mapping and any triple with bnodes in it is not ground, see
>> RDF Semantics http://www.w3.org/TR/rdf-mt/#graphdefs Section 0.3: "A
>> ground RDF graph is one with no blank nodes."
>>
>> If we really would only delete ground triples, then we won't have
>> problems and I have to say that I am very inclined to suggest sticking
>> to that, i.e., if applying the solution mapping does not yield ground
>> triples, nothing is being deleted.
>>
>
> Lee to answer, but I assume that he didn't consider bnodes coming from the bindings in "ground" here.
> I assume that, because I think we want to be able to delete triples with bnodes (otherwise, any triple with a bnode can only be removed by
> dropping/clearing the graph. Anyways, this seems indeed an issue also with regards to Andy's previous mail... i.e. the issue around
> what is the scoping graph for matching the WHERE part in update requests... It seems we want to be able to
> DELETE triples with bnodes, with the current definition of scoping graph from Query, see
>  http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0328.html
> this is probably not possible, so we'd need to define the scoping graph differently (preserving bnode labels?) for Update.
>
> Axel
>
>> This has the disadvantage that deleting lists is not that easy, but
>> apart from that no complicated rewritings and query pattern extensions
>> are required.
>
>> Birte
>>
>> > Is each part of the template treated as if it is a separate thing to
>> > be deleted, or does the re-use of the same blank node label (_:a in
>> > this case) have some kind of effect?
>> >
>> > Regards,
>> > Paul
>> >
>> > On Mon, Feb 21, 2011 at 3:10 AM, Axel Polleres <axel.polleres@deri.org> wrote:
>> >> Thanks Greg for spotting this! Actually, there were two mistakes in the P_B
>> >> pattern below...  see inline.
>> >>
>> >>
>> >> On 20 Feb 2011, at 23:40, Gregory Williams wrote:
>> >>
>> >> On Feb 18, 2011, at 1:31 PM, Axel Polleres wrote:
>> >>
>> >>> For any unnamed bnode _:B in a modify_template_DEL
>> >>> (i) new variables ?Var_B ?Var_B1 ?Var_B2 ?Var_Bg are introduced,
>> >>> (ii) _:B is replaced by ?Var_B in a modify_template_DEL,
>> >>> (iii) Pattern P is replaced by P_B such that
>> >>>
>> >>>  P_B =  { P } UNION {
>> >>>          SELECT DISTINCT ?Var_B
>> >>>            {  { ?Var_B ?Var_B1 ?Var_B2 } UNION
>> >>>               { ?Var_B1 ?Var_B ?Var_B2 } UNION
>> >>>               { ?Var_B1 ?Var_B2 ?Var_B } UNION
>> >>>               { GRAPH ?Var_Bg {?Var_B1 ?Var_B2 ?Var_B } } UNION
>> >>>               { GRAPH ?Var_Bg {?D1 ?Var_B2 ?Var_B } } UNION
>> >>>               { GRAPH ?Var_Bg {?Var_B1 ?Var_B2 ?Var_B } } }
>> >>>
>> >>> That is, ?Var_B binds to all possible values in the vocabulary of GS.
>> >>
>> >> I'm not totally swapped in on this issue, but I don't understand how this
>> >> pattern aligns with the description. In all three named graph union
>> >> branches, ?Var_B appears in the object position. Since ?Var_B is the only
>> >> projected variable, don't these three branches return the same results?
>> >>
>> >> yes, that was a copy-paste error from an earlier version I had. The correct
>> >> one should be
>> >>     P_B =  { { P } { Q } }
>> >> i.e. join, not UNION (thanks Andy, for pointing me to that offlist), where Q
>> >> is defined as below...
>> >>
>> >>> I know that the definition of P_B doesn't look very nice, but this
>> >>> definition should cover the intended semantics of [2].
>> >>> In principle, the idea is that bnodes, that should behave as wildcard
>> >>> should bind to *any* element in the signature of GS,
>> >>> which is what is returned by the subquery
>> >>>        Q = SELECT DISTINCT ?D
>> >>>            {  { ?D ?D1 ?D2 } UNION
>> >>>               { ?D1 ?D ?D2 } UNION
>> >>>               { ?D1 ?D2 ?D } UNION
>> >>>               { GRAPH ?Dg {?D1 ?D2 ?D } } UNION
>> >>>               { GRAPH ?Dg {?D1 ?D2 ?D } } UNION
>> >>>               { GRAPH ?Dg {?D1 ?D2 ?D } } }
>> >>
>> >> ...corrected version of Q:
>> >>
>> >>         Q = SELECT DISTINCT ?Var_B
>> >>
>> >>             {  { ?Var_B ?Var_B1 ?Var_B2 } UNION
>> >>
>> >>                { ?Var_B1 ?Var_B ?Var_B2 } UNION
>> >>
>> >>                { ?Var_B1 ?Var_B2 ?Var_B } UNION
>> >>
>> >>                { GRAPH ?Var_Bg {?Var_B ?Var_B1 ?Var_B2 } } UNION
>> >>
>> >>                { GRAPH ?Var_Bg {?Var_B1 ?Var_B ?Var_B2 } } UNION
>> >>                { GRAPH ?Var_Bg {?Var_B1 ?Var_B2 ?Var_B } } }
>> >>
>> >> Even more obvious in this one. Am I missing something?
>> >>
>> >>
>> >> Hope I got it right now ;-)
>> >>
>> >> Thanks,
>> >> Axel
>> >>
>> >> thanks,
>> >> .greg
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Dr. Birte Glimm, Room 309
>> Computing Laboratory
>> Parks Road
>> Oxford
>> OX1 3QD
>> United Kingdom
>> +44 (0)1865 283520
>>
>
>
>



-- 
Dr. Birte Glimm, Room 309
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283520
Received on Wednesday, 23 February 2011 16:58:18 UTC