Re: SPARQL 1.1 Update Review (part 2) from Andy Seaborne on 2011-03-16 (public-rdf-dawg@w3.org from January to March 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Wed, 16 Mar 2011 16:55:29 +0000
To: Axel Polleres <axel.polleres@deri.org>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4D80EB81.9080105@epimorphics.com>
On 16/03/11 15:39, Axel Polleres wrote:
> Hi Andy, all,
>
>> [*] An RDF dataset is a set { DG, (<u_i>, G_i)} -- write it same as
>> query has it, not "DG' union {(iri'j, G'j) | 1 <= j <= m})"
>
> Indeed, an RDF dataset is a set:
>
> { G, (<u1>, G1), (<u2>, G2), ... (<un>, Gn) }
>
> that is just the same as writing
> { G } union {(iri'j, G'j) | 1 <= j <= n }
>
> so, G would need to be in parentheses at least, I see.
>
> BTW, I think we should probably just unify the definitions of Graph
> Store and Dataset.

I don't.  An RDF dataset is a mathematical set - you can't change it. 
At one time, I though the differences were small enough that may didn't 
not matter much but now I see that the naming issues matter, here and in 
the dataset protocol (should be "Graph Storh Protocol")

In RDF-WG speak:
For example n-quads is the g-snap (RDF dataset) of a g-box (graph store).

>
>
> Next, I was thinking a bit about the following:
>
>>> Dataset(modify_template, P) = { instantiate(modify_template) | μ a
>>> solution of P }
>>>
>>> instantiate(modify_template) = ....
>>
>
> I couldn't really come around for a definition of instantiate(),
> but - at least inspired by your suggestion - I think something like the
> following would work:
>
> ----------------------------------------------------------------------
>

Aside : any chance of plain text?  You can use μ :-)

>
> =======================================
> Auxiliary Definition: Dataset(modify_template, &mu; )
>
> Let &mu; be a solution mapping.
>
>
> * For a modify_template of the form '{ TriplesBlock }'
>
> Dataset(modify_template, &mu; )
>
> is the Dataset consisting of only a default graph composed by
>
> all valid RDF triples obtained from substituting the variables in
>
> TriplesBlock according to &mu; and combining the triples
> into a single RDF graph by set union.

> * For a modify_template of the form 'GRAPH VarOrIRIref { TriplesBlock }'
>
> Dataset(modify_template, &mu; )
>
> is the Dataset consisting of the empty default graph and a named
>
> graph &mu;(VarOrIRIref) composed by all valid RDF triples obtained from
> substituting
>
> the variables in TriplesBlock according to &mu; and combining the triples
> into a single named RDF graph by set union.
>
>
> * For a complex modify_template of the form '{ modify_template1
> modify_template2 }'
>
> Dataset(modify_template, &mu; ) = Dataset-UNION (
> Dataset(modify_template1, &mu; ) , Dataset(modify_template2, &mu; ) )
>

Unrelated:
modify_template is currently only

> =======================================
>
>
> =======================================
> Definition: Dataset(modify_template, P, GS )
>
> Let sk() is a bijection that replaces every bnode identifier in the
> graph store GS with a unique fresh constant



Yes - juts do it before the process of s/?var/term value/g for each 
template.

For each μ
   1/ Make new template from operation one with fresh bnodes.
   2/ Substitute variables for values in new template.
   3/ Aggregate triples, quads into a dataset


> and sk^-1() is the inverse mapping to sk() reintroducing the original
> bnode labels.

I don't understand why you need to reverse the mapping.  This is only on 
the template.

> Dataset(modify_template, P, GS ) =
>
> Dataset-UNION( sk^-1( Dataset(modify_template, &mu;) ) ) over all &mu;
> such that &mu; is a solution of P over Dataset sk(GS)
>
> =======================================
>
> Here, the application of sk() prior to query evaluation guarantees that
> co-referent bnode identifiers in GS are
>
> not "lost" during pattern evaluation, cf. 17.3.2

-> 18.3.2

 > Treatment of Blank
> Nodes of SPARQL1.1 Query.

?? we don't need to touch the WHERE clause, just define the process on 
templates?

The one case for DELETE WHERE I'm suggesting is easiest to handle as 
being a short syntactic form discussed in the formal section but before 
the operation definitions where it does not need specific mention (it's 
been rewritten out of the way).

DELETE WHERE { T } ==> DELETE { T } WHERE { T }

>
>
> ----------------------------------------------------------------------
>
>
> The functions sk and sk^-1 are needed to address the problem we
> discussed in [1]
>
>
> I can also attempt to put that in the xml form necessary for Update.
>
>
> best,
>
> Axel
>
>
>
> 1. http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0328.html

can avoid the scoping graph issue by defining sk() to introduce bNodes 
based on the template before variable substitution.

Let me try to write out what you propose expressed as I have above 
concretely:

(Changed example:)
Graph store:
    DG: { _:a :q :r . _b :q :r . }

INSERT { ?x :p _:x . } WHERE { ?x :q :r }

μ1 = ?x/_:a -- scoping graph = actual graph
μ2 = ?x/_:b

μ1:
Step 1:
   { ?x :p _:x . } => { ?x :p _:gen1 . }
Step 2:
   { ?x :p _:gen1 . } =>  _:a :p _:gen1 .


μ2:
Step 1:
   { ?x :p _:x . } => { ?x :p _:gen2 . }
Step 2:
   { ?x :p _:gen1 . } =>  _:b :p _:gen2 .

==> triples _:a :p _:gen1 . _:b :p _:gen2 .

Does that look right to you?

 Andy

>
> On 10 Mar 2011, at 16:48, Andy Seaborne wrote:
>
>> ==== SPARQL Update (part 2)
>> This completes my review.
>>
>> Covers section 4 onwards but also ..
>>
>> === 3.1.1 INSERT DATA
>>
>> [**]
>> """
>> INSERT DATA { graph_triples }
>>
>> Graph triples are defined as:
>>
>> graph_triples ::= TriplesBlock | GRAPH <uri> { TriplesBlock }
>> """
>>
>> This disallows:
>>
>> INSERT DATA { :s :p :o . GRAPH :g { :s1 :p1 :o } }
>> INSERT DATA { GRAPH :g2 {:s :p :o } . GRAPH :g { :s1 :p1 :o } }
>>
>> Is there a reason for this?
>> The grammar allows it.
>>
>> Its seems unnecessary to force the application to separate out the
>> triples.
>>
>> This is repeated:
>> = 3.1.2 DELETE DATA
>> = 3.1.3 DELETE/INSERT
>> modify_template ::= ConstructTriples | graph_template
>> = 3.1.4 DELETE
>> = 3.1.5 INSERT
>>
>>
>> == Section 4:
>>
>> [**]
>> I suggest a section on how certain forms map to other forms, then must
>> define the fundamental forms.
>>
>> Rewrites for ADD, COPY, MOVE (some text exists elsewhere but should be
>> in the formal section)
>> DELETE WHERE, DELETE {} WHERE, INSERT {} WHERE
>>
>> Maybe CLEAR as well.
>>
>> then define DELETE{}INSERT{}WHERE{}, LOAD, CREATE, DROP, INSERT DATA,
>> DELETE DATA.
>>
>> Something on WITH and USING to formalise them as syntactic features.
>> There is material elsewhere but I feel the formal section should be
>> self-contained able to cover all SPARQL Update.
>>
>>
>> [**]
>> Need an account of how the syntax maps to the operations. It's fairly
>> obvious but probably should be said.
>>
>> == 4.1.1 Graph Store
>>
>> [] s/associated to/associated with/
>>
>> [*] Say the IRIi are distinct.
>>
>> [] It says: "1 <= i <= n" but nothing about n
>>
>> == 4.1.2 Update Operation
>>
>> The "t+1" notation isn't used anywhere.
>>
>> As the state of a store only depends on the previous state and the
>> operation and not t-2, it's not necessary.
>>
>> Is this definition used anywhere? I could immediately see that it's
>> needed and wondered if it is historical now.
>>
>> == 4.2 Auxiliary Definitions
>> == 4.2.1 Dataset-UNION
>>
>> [*] An RDF dataset is a set { DG, (<u_i>, G_i)} -- write it same as
>> query has it, not "DG' union {(iri'j, G'j) | 1 <= j <= m})"
>>
>> [**] Not merge - this must be a union. not rename blank nodes apart.
>> Otherwise one operation followed by another will not update the same
>> bNode. And datseta-diff is not going to work.
>>
>> == 4.2.2 Dataset-DIFF
>>
>> [*] dataset comment as dataset-union.
>> [**] Its says "merge" (bullet 3). Should be set-difference or minus.
>> [] G_j should be G sub j.
>>
>> == 4.3.1 Insert Data Operation
>>
>> """
>> graph_triples, i.e. either a dataset consisting of a single named graph
>> and an empty default graph
>> """
>> [**] As we have defined dataset-union, I think this should be dataset
>> union, nor limited to one graph. See also the graph_triples issue above.
>>
>> == 4.3.2 Delete Data Operation
>>
>> [**] graph_triples
>>
>> == 4.3.3 Delete Insert Operation
>>
>> """
>> Triples are identified as they match a particular Group Graph Pattern P.
>> """
>> [**] The triples here are the ones to be deleted or inserted - they are
>> not identified by matching - there is a template stage in between.
>>
>> [**] Define modify_template sub DEL and modify_template sub INS
>>
>> [**] Dataset(modify_template, P)
>>
>> Write this out formally:
>>
>> Dataset(modify_template, P) = { instantiate(modify_template) | μ a
>> solution of P }
>>
>> instantiate(modify_template) = ....
>>
>>
>> These are superseded if there is an abbreviated forms section:
>> == 4.3.4 Delete Operation
>> == 4.3.5 Insert Operation
>> == 4.3.6 Delete Where Operation
>>
>>
>> == 4.3.7 Insert Where Operation
>> [**] What's this used with?
>> "Insert Where ... are *deleted* from the Graph Store"
>>
>> == 4.4.1 Create Operation
>>
>> [*] Either something on what happens about empty graphs or, in the
>> section intro, say the definitions assume we can have empty graphs. the
>> latter is probably better.
>>
>>
>> == 5 Conformance
>> [*] remove / update name to "RDF Dataset HTTP Protocol"
>>
>>
>
Received on Wednesday, 16 March 2011 16:56:07 UTC