Re: SPARQL update semantics Dataset-UNION vs. Dataset-MERGE from Axel Polleres on 2011-04-20 (public-rdf-dawg@w3.org from April to June 2011)

From: Axel Polleres <axel.polleres@deri.org>
Date: Wed, 20 Apr 2011 15:20:10 +0100
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-dawg@w3.org
Message-Id: <3029B8E0-D934-4383-A871-BE7A044E7C36@deri.org>
On 18 Apr 2011, at 14:57, Andy Seaborne wrote:

> 
> 
> On 18/04/11 00:13, Axel Polleres wrote:
>> Hi all,
>> 
>> trying to catch up with my actions and especially with Update
>> semantics... taking some closer look at Dataset-UNION vs.
>> Dataset-MERGE, since we now have a definition of Dataset-MERGE in
>> query...
> 
> Non-technical:
> 
> It would be better to have a definition of dataset-union that is specifically useful for SPARQL Update.
> 
> 1/ It isolates us (SPARQl-WG) from decisions of RDF-WG around datasets
> 
> 2/ There are specific issue to do with subgraphs where we want very precise handling of bnodes.
> 
> By the way: The current defn of Dataset-MERGE is wrong (as Peter PS has pointed out) and needs fixing.

Would a modified version of the definition we have for Dataset-UNION
 http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_datasetUnion
such that 
 s/where union between graphs is defined as set-union of triples in those graphs./
   where union between graphs is defined as RDF merge of those graphs./
do better? (in spirit... I mean, probably I would word it differently)

I don't think that this definition suffers from ambiguities as claimed by Peter.
I didn't mean to argue for bringing us in a position to be dependent on RDF WG decisions, but rather wanted 
to argue that it seems to me that for our purposes Dataset-MERGE instead of Dataset-UNION could be useful.

I will try to find time to check the further implications in the coming few days, I just wanted to ask for the moment

best,
Axel

> 
>> My overall impression is that we actually may want to switch to
>> Dataset-MERGE  for all our definitions in SPARQL update as well...
>> explained in the following:
>> 
>> 1) For the Insert Data Operation
>> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_insertdataoperation
>> , it seems to me that we don't really want to be reusing bnode labels
>> (assuming that an agent inserting Data is not aware of the bnode
>> lables in the graph store anyways.
>> 
>> i.e.
>> 
>> INSERT DATA { _:a :p :o }
>> 
>> should IMO insert a new bnode label, rather than using  _:a as the
>> label to be inserted.
> 
> labels are a different matter.
> 
> If you have two parsings (same bytes sent in two requests):
> 
> INSERT DATA { _:a :p :o }
> 
> you get two bNodes.
> 
> bNode label is scoped to a file
> bNode
> 
> The general text in "12.3.2 Treatment of Blank Nodes" (SPARQL 1.0)
> 
> http://www.w3.org/TR/rdf-sparql-query/#BGPsparqlBNodes
> 
> talks about
> 
> """
> The scoping graph is purely a theoretical construct; in practice, the effect is obtained simply by the document scope conventions for blank node identifiers.
> """
> or to put it another way, bNodes have global identity but that identity is not the same as label or document identifier.
> 
> And in particular, a bNode can be in two graphs.
> One graph is known to be a subgraph of the other.
> 
> If we copy over some triples from one graph to another, then find the bnode agin:
> 
> INSERT { GRAPH <G> { ?s :label "Hello" . } }
> WHERE
>       { ?s :key 57 . # Finds a bNode.
>       }
> 
> .. later .. same request or different request ...
> 
> DELETE { GRAPH <G> { ?s :label "Hello" . } }
> INSERT { GRAPH <G> { ?s :label "Hello2" . } }
> WHERE
>       { ?s :key 57 . # Finds a bNode. }
> 
> should find the same bNode (or at least that to be a legal implementation of SPARQL Update).
> 
> It's the round-trip problem for SPARQL results, made to exists solely inside one store, without serializing/deserializing via the result set format.
> 
>> 
>> 2) The Delete Insert Operation
>> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_deleteinsertoperation
>> 
>> 
> anyways relies on the Dataset() function (cf. http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_datasetPattern)
>> which "skolemises" bnodes away. before ...
> 
> and which does not need to do that because bNodes have identity, it's just not their label.
> 
> Even if it's done by sk() the definition of that needs tightening up to make it stable across requests.
> 
> That "fresh constant" is not more than the bNode identity.  Because sk-1() exists, it is a name for the bNode, (not the thing denoted by the bNode, which is what skolemization does).
> 
> 4.2.4  Dataset(QuadPattern,  P, GS )
> "the original bnode labels."
> 
> the labels are a syntax-only feature.
> 
> Another fix needed for sk() is that the "fresh constant" must not collide with any term in a request, nor any future request.  Making it something other than a IRI or literal (or bNode!) does this.  But then it's exactly treating bNodes as having identity, so just use the bNode itself.
> 
> Easiest fix seems to be to just put "labels" as a syntax feature and explains that labels in syntax are not global or graph store-wide names for bNodes.
> 
> Editorial/major:
> 
> We also need to decide what happens when the same label is used multiple times in one request.  There is text for this but it's buried.
> 
> 3.1.1 has :
> """
> Blank node labels in QuadDatas are assumed to be disjoint from the blank nodes in the Graph Store and will be inserted as new blank nodes.
> """
> 
> but 3.1.2 has:
> """
> Since blank node labels are only unique within each specific context
> """
> 
> what exactly is a 'context'?
> 
> Discussion is only for INSERT DATA and DELETE DATA, not the pattern operations.
> 
>> 
>> 3) ... I think the Dataset() function of
>> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_datasetPattern
>> can be changed actually from
>> 
>> Dataset(QuadPattern, P, GS ) = Dataset-UNION( sk-1(
>> Dataset(QuadPattern, μ) ) | μ in eval(sk(GS)(sk(DG)),P) )
>> 
>> to
>> 
>> Dataset(QuadPattern, P, GS ) = sk-1 (Dataset-MERGE(
>> Dataset(QuadPattern, μ) ) | μ in eval(sk(GS)(sk(DG)),P) )
>> 
>> without changing of meaning... (again, since bnodes have been
>> skolemised away and are only re-introduced via the final sk-1)
> >
>> 4) In
>> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_loadoperation
>> ... it is actually ok to use Dataset-MERGE, since you don't want to
>> reuse bnode-labels coming from an external Graph.
>> 
>> 5) The use of Dataset-UNION() in
>> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_clearoperation
>> can be changed to Dataset-MERGE without altering semantics.
> 
> Just on this, "merge" does not give a stable bNode identification anyway.  Merge (graph, dataset) can complete replace every bNode by another (DS has same meaning, but is not the same sets of triples).
> 
>> 
>> 6) The use of Dataset-UNION() in
>> http://www.w3.org/2009/sparql/docs/update-1.1/Overview.xml#def_createoperation
>> can be changed to Dataset-MERGE without altering semantics.
> 
> BTW strictly, skolemization does alter semantics.
> 
> http://www.w3.org/TR/rdf-mt/#prf
> 
> """
> a graph should not be thought of as being equivalent to its Skolemization
> """
> 
>> these seem to be all uses of Dataset-UNION(), please let me know if I
>> am missing something.
> 
> I prefer the use of bNode identity (at least across the graph store) so that issues of bNodes in two graphs are clear.
> 
>> 
>> best, Axel
>> 
>  Andy
>
Received on Wednesday, 20 April 2011 14:20:41 UTC