Re: SPARQL update semantics Dataset-UNION vs. Dataset-MERGE from Andy Seaborne on 2011-04-18 (public-rdf-dawg@w3.org from April to June 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 18 Apr 2011 14:57:09 +0100
To: public-rdf-dawg@w3.org
Message-ID: <4DAC4335.6020907@epimorphics.com>
On 18/04/11 00:13, Axel Polleres wrote:
> Hi all,
>
> trying to catch up with my actions and especially with Update
> semantics... taking some closer look at Dataset-UNION vs.
> Dataset-MERGE, since we now have a definition of Dataset-MERGE in
> query...

Non-technical:

It would be better to have a definition of dataset-union that is 
specifically useful for SPARQL Update.

1/ It isolates us (SPARQl-WG) from decisions of RDF-WG around datasets

2/ There are specific issue to do with subgraphs where we want very 
precise handling of bnodes.

By the way: The current defn of Dataset-MERGE is wrong (as Peter PS has 
pointed out) and needs fixing.

> My overall impression is that we actually may want to switch to
> Dataset-MERGE  for all our definitions in SPARQL update as well...
> explained in the following:
>
> 1) For the Insert Data Operation
> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_insertdataoperation
> , it seems to me that we don't really want to be reusing bnode labels
> (assuming that an agent inserting Data is not aware of the bnode
> lables in the graph store anyways.
>
> i.e.
>
> INSERT DATA { _:a :p :o }
>
> should IMO insert a new bnode label, rather than using  _:a as the
> label to be inserted.

labels are a different matter.

If you have two parsings (same bytes sent in two requests):

INSERT DATA { _:a :p :o }

you get two bNodes.

bNode label is scoped to a file
bNode

The general text in "12.3.2 Treatment of Blank Nodes" (SPARQL 1.0)

http://www.w3.org/TR/rdf-sparql-query/#BGPsparqlBNodes

talks about

"""
The scoping graph is purely a theoretical construct; in practice, the 
effect is obtained simply by the document scope conventions for blank 
node identifiers.
"""
or to put it another way, bNodes have global identity but that identity 
is not the same as label or document identifier.

And in particular, a bNode can be in two graphs.
One graph is known to be a subgraph of the other.

If we copy over some triples from one graph to another, then find the 
bnode agin:

INSERT { GRAPH <G> { ?s :label "Hello" . } }
WHERE
        { ?s :key 57 . # Finds a bNode.
        }

.. later .. same request or different request ...

DELETE { GRAPH <G> { ?s :label "Hello" . } }
INSERT { GRAPH <G> { ?s :label "Hello2" . } }
WHERE
        { ?s :key 57 . # Finds a bNode. }

should find the same bNode (or at least that to be a legal 
implementation of SPARQL Update).

It's the round-trip problem for SPARQL results, made to exists solely 
inside one store, without serializing/deserializing via the result set 
format.

>
> 2) The Delete Insert Operation
> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_deleteinsertoperation
>
>
anyways relies on the Dataset() function (cf. 
http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_datasetPattern)
> which "skolemises" bnodes away. before ...

and which does not need to do that because bNodes have identity, it's 
just not their label.

Even if it's done by sk() the definition of that needs tightening up to 
make it stable across requests.

That "fresh constant" is not more than the bNode identity.  Because 
sk-1() exists, it is a name for the bNode, (not the thing denoted by the 
bNode, which is what skolemization does).

4.2.4  Dataset(QuadPattern,  P, GS )
"the original bnode labels."

the labels are a syntax-only feature.

Another fix needed for sk() is that the "fresh constant" must not 
collide with any term in a request, nor any future request.  Making it 
something other than a IRI or literal (or bNode!) does this.  But then 
it's exactly treating bNodes as having identity, so just use the bNode 
itself.

Easiest fix seems to be to just put "labels" as a syntax feature and 
explains that labels in syntax are not global or graph store-wide names 
for bNodes.

Editorial/major:

We also need to decide what happens when the same label is used multiple 
times in one request.  There is text for this but it's buried.

3.1.1 has :
"""
Blank node labels in QuadDatas are assumed to be disjoint from the blank 
nodes in the Graph Store and will be inserted as new blank nodes.
"""

but 3.1.2 has:
"""
  Since blank node labels are only unique within each specific context
"""

what exactly is a 'context'?

Discussion is only for INSERT DATA and DELETE DATA, not the pattern 
operations.

>
> 3) ... I think the Dataset() function of
> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_datasetPattern
> can be changed actually from
>
> Dataset(QuadPattern, P, GS ) = Dataset-UNION( sk-1(
> Dataset(QuadPattern, μ) ) | μ in eval(sk(GS)(sk(DG)),P) )
>
> to
>
> Dataset(QuadPattern, P, GS ) = sk-1 (Dataset-MERGE(
> Dataset(QuadPattern, μ) ) | μ in eval(sk(GS)(sk(DG)),P) )
>
> without changing of meaning... (again, since bnodes have been
> skolemised away and are only re-introduced via the final sk-1)
 >
> 4) In
> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_loadoperation
> ... it is actually ok to use Dataset-MERGE, since you don't want to
> reuse bnode-labels coming from an external Graph.
>
> 5) The use of Dataset-UNION() in
> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_clearoperation
> can be changed to Dataset-MERGE without altering semantics.

Just on this, "merge" does not give a stable bNode identification 
anyway.  Merge (graph, dataset) can complete replace every bNode by 
another (DS has same meaning, but is not the same sets of triples).

>
> 6) The use of Dataset-UNION() in
> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_createoperation
> can be changed to Dataset-MERGE without altering semantics.

BTW strictly, skolemization does alter semantics.

http://www.w3.org/TR/rdf-mt/#prf

"""
a graph should not be thought of as being equivalent to its Skolemization
"""

> these seem to be all uses of Dataset-UNION(), please let me know if I
> am missing something.

I prefer the use of bNode identity (at least across the graph store) so 
that issues of bNodes in two graphs are clear.

>
> best, Axel
>
 Andy
Received on Monday, 18 April 2011 13:57:48 UTC