Re: SPARQL 1.1 Update Review (part 2)

On 16 Mar 2011, at 16:55, Andy Seaborne wrote:
> On 16/03/11 15:39, Axel Polleres wrote:
> > Hi Andy, all,
> >
> >> [*] An RDF dataset is a set { DG, (<u_i>, G_i)} -- write it same as
> >> query has it, not "DG' union {(iri'j, G'j) | 1 <= j <= m})"
> >
> > Indeed, an RDF dataset is a set:
> >
> > { G, (<u1>, G1), (<u2>, G2), ... (<un>, Gn) }
> >
> > that is just the same as writing
> > { G } union {(iri'j, G'j) | 1 <= j <= n }
> >
> > so, G would need to be in parentheses at least, I see.
> >
> > BTW, I think we should probably just unify the definitions of Graph
> > Store and Dataset.
> 
> I don't.  An RDF dataset is a mathematical set - you can't change it.

I didn't mean to unify the concepts, I just meant to use the same definition, that is, instead of writing GS as a pair 
in the definitions, also use a set, such that any Update operation is a transformation from one set into another (at the moment, we already say that an UpdateOperation transforms a Graph Store GS at time point t, denoted as GSt, to another Graph Store GSt+1)

I.e., what I'd suggest is to change:
----------------------------------
Definition: Graph Store

A Graph Store GS is a mutable container of RDF graphs. It has one unnamed (default) slot and zero or more named slots identified by an IRI irii. Each slot holds an RDF graph, i.e.: 
 
  GS = (DG, {(irii, Gi) | 1 <= i <= n})

where
 # DG is the RDF graph associated to the unnamed slot
 # for each 1 <= i <= n, Gi is an RDF graph associated to the named slot identified by IRI irii
----------------------------------

to 

----------------------------------
Definition: Graph Store

A Graph Store GS is a mutable container of RDF graphs. It has one unnamed (default) slot and zero or more named slots identified by an IRI irii. Each slot holds an RDF graph, i.e. the graph store can be viewed as a mutable <a href=...>RDF Dataset</a> 
 
  GS = { DG } union { (irii, Gi) | 1 <= i <= n } 

where
 # the default graph DG is the RDF graph associated to the unnamed slot
 # for each 1 <= i <= n, Gi is an RDF graph associated to the named slot identified by IRI irii
----------------------------------

With that change, in all the other definitions we don't have to bother 
anymore about different notations (pair vs set) between GS and DS.

> At one time, I though the differences were small enough that may didn't
> not matter much but now I see that the naming issues matter, here and in
> the dataset protocol (should be "Graph Storh Protocol")

yeah, but if we say Graph store operations change from one dataset to anoter, 
we'd be ok, wouldn't we? And the wording "can be viewed as a mutable <a href=...>RDF Dataset</a>" 
should be vague enough, I hope, to just be able to use the same notation for our definitions at least. 


> In RDF-WG speak:
> For example n-quads is the g-snap (RDF dataset) of a g-box (graph store).
> 
> >
> >
> > Next, I was thinking a bit about the following:
> >
> >>> Dataset(modify_template, P) = { instantiate(modify_template) | μ a
> >>> solution of P }
> >>>
> >>> instantiate(modify_template) = ....
> >>
> >
> > I couldn't really come around for a definition of instantiate(),
> > but - at least inspired by your suggestion - I think something like the
> > following would work:
> >
> > ----------------------------------------------------------------------
> >
> 
> Aside : any chance of plain text?  You can use μ :-)

:-) I was worried about ancient non-unicode mail-clients


> 
> >
> > =======================================
> > Auxiliary Definition: Dataset(modify_template, &mu; )
> >
> > Let &mu; be a solution mapping.
> >
> >
> > * For a modify_template of the form '{ TriplesBlock }'
> >
> > Dataset(modify_template, &mu; )
> >
> > is the Dataset consisting of only a default graph composed by
> >
> > all valid RDF triples obtained from substituting the variables in
> >
> > TriplesBlock according to &mu; and combining the triples
> > into a single RDF graph by set union.
> 
> > * For a modify_template of the form 'GRAPH VarOrIRIref { TriplesBlock }'
> >
> > Dataset(modify_template, &mu; )
> >
> > is the Dataset consisting of the empty default graph and a named
> >
> > graph &mu;(VarOrIRIref) composed by all valid RDF triples obtained from
> > substituting
> >
> > the variables in TriplesBlock according to &mu; and combining the triples
> > into a single named RDF graph by set union.
> >
> >
> > * For a complex modify_template of the form '{ modify_template1
> > modify_template2 }'
> >
> > Dataset(modify_template, &mu; ) = Dataset-UNION (
> > Dataset(modify_template1, &mu; ) , Dataset(modify_template2, &mu; ) )
> >
> 
> Unrelated:
> modify_template is currently only
> 
> > =======================================
> >
> >
> > =======================================
> > Definition: Dataset(modify_template, P, GS )
> >
> > Let sk() is a bijection that replaces every bnode identifier in the
> > graph store GS with a unique fresh constant
> 
> 
> 
> Yes - juts do it before the process of s/?var/term value/g for each
> template.
> 
> For each μ
>    1/ Make new template from operation one with fresh bnodes.
>    2/ Substitute variables for values in new template.
>    3/ Aggregate triples, quads into a dataset
> 
> 
> > and sk^-1() is the inverse mapping to sk() reintroducing the original
> > bnode labels.
> 
> I don't understand why you need to reverse the mapping.  This is only on
> the template.

... see below...

> 
> > Dataset(modify_template, P, GS ) =
> >
> > Dataset-UNION( sk^-1( Dataset(modify_template, &mu;) ) ) over all &mu;
> > such that &mu; is a solution of P over Dataset sk(GS)
> >

... the application of sk^-1() *after* the resulting dataset has been generated
guarantees that in the resulting dataset now the same blank node labels are used as in GS 
and if this resulting dataset is added to or removed from GS by an INSERT or DELETE, 
then bnode correlations wrt. GS will be followed.

Otherwise, something like 

 DELETE { ?s ?p ?o }
 WHERE  { ?s ?p ?o FILTER Blank(?s)}

wouldn't have any effect.

best,
Axel

> > =======================================
> >
> > Here, the application of sk() prior to query evaluation guarantees that
> > co-referent bnode identifiers in GS are
> >
> > not "lost" during pattern evaluation, cf. 17.3.2
> 
> -> 18.3.2
> 
>  > Treatment of Blank
> > Nodes of SPARQL1.1 Query.
> 
> ?? we don't need to touch the WHERE clause, just define the process on
> templates?
> 
> The one case for DELETE WHERE I'm suggesting is easiest to handle as
> being a short syntactic form discussed in the formal section but before
> the operation definitions where it does not need specific mention (it's
> been rewritten out of the way).
> 
> DELETE WHERE { T } ==> DELETE { T } WHERE { T }
> 
> >
> >
> > ----------------------------------------------------------------------
> >
> >
> > The functions sk and sk^-1 are needed to address the problem we
> > discussed in [1]
> >
> >
> > I can also attempt to put that in the xml form necessary for Update.
> >
> >
> > best,
> >
> > Axel
> >
> >
> >
> > 1. http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0328.html
> 
> can avoid the scoping graph issue by defining sk() to introduce bNodes
> based on the template before variable substitution.
> 
> Let me try to write out what you propose expressed as I have above
> concretely:
> 
> (Changed example:)
> Graph store:
>     DG: { _:a :q :r . _b :q :r . }
> 
> INSERT { ?x :p _:x . } WHERE { ?x :q :r }
> 
> μ1 = ?x/_:a -- scoping graph = actual graph
> μ2 = ?x/_:b
> 
> μ1:
> Step 1:
>    { ?x :p _:x . } => { ?x :p _:gen1 . }
> Step 2:
>    { ?x :p _:gen1 . } =>  _:a :p _:gen1 .
> 
> 
> μ2:
> Step 1:
>    { ?x :p _:x . } => { ?x :p _:gen2 . }
> Step 2:
>    { ?x :p _:gen1 . } =>  _:b :p _:gen2 .
> 
> ==> triples _:a :p _:gen1 . _:b :p _:gen2 .
> 
> Does that look right to you?
> 
>         Andy
> 
> >
> > On 10 Mar 2011, at 16:48, Andy Seaborne wrote:
> >
> >> ==== SPARQL Update (part 2)
> >> This completes my review.
> >>
> >> Covers section 4 onwards but also ..
> >>
> >> === 3.1.1 INSERT DATA
> >>
> >> [**]
> >> """
> >> INSERT DATA { graph_triples }
> >>
> >> Graph triples are defined as:
> >>
> >> graph_triples ::= TriplesBlock | GRAPH <uri> { TriplesBlock }
> >> """
> >>
> >> This disallows:
> >>
> >> INSERT DATA { :s :p :o . GRAPH :g { :s1 :p1 :o } }
> >> INSERT DATA { GRAPH :g2 {:s :p :o } . GRAPH :g { :s1 :p1 :o } }
> >>
> >> Is there a reason for this?
> >> The grammar allows it.
> >>
> >> Its seems unnecessary to force the application to separate out the
> >> triples.
> >>
> >> This is repeated:
> >> = 3.1.2 DELETE DATA
> >> = 3.1.3 DELETE/INSERT
> >> modify_template ::= ConstructTriples | graph_template
> >> = 3.1.4 DELETE
> >> = 3.1.5 INSERT
> >>
> >>
> >> == Section 4:
> >>
> >> [**]
> >> I suggest a section on how certain forms map to other forms, then must
> >> define the fundamental forms.
> >>
> >> Rewrites for ADD, COPY, MOVE (some text exists elsewhere but should be
> >> in the formal section)
> >> DELETE WHERE, DELETE {} WHERE, INSERT {} WHERE
> >>
> >> Maybe CLEAR as well.
> >>
> >> then define DELETE{}INSERT{}WHERE{}, LOAD, CREATE, DROP, INSERT DATA,
> >> DELETE DATA.
> >>
> >> Something on WITH and USING to formalise them as syntactic features.
> >> There is material elsewhere but I feel the formal section should be
> >> self-contained able to cover all SPARQL Update.
> >>
> >>
> >> [**]
> >> Need an account of how the syntax maps to the operations. It's fairly
> >> obvious but probably should be said.
> >>
> >> == 4.1.1 Graph Store
> >>
> >> [] s/associated to/associated with/
> >>
> >> [*] Say the IRIi are distinct.
> >>
> >> [] It says: "1 <= i <= n" but nothing about n
> >>
> >> == 4.1.2 Update Operation
> >>
> >> The "t+1" notation isn't used anywhere.
> >>
> >> As the state of a store only depends on the previous state and the
> >> operation and not t-2, it's not necessary.
> >>
> >> Is this definition used anywhere? I could immediately see that it's
> >> needed and wondered if it is historical now.
> >>
> >> == 4.2 Auxiliary Definitions
> >> == 4.2.1 Dataset-UNION
> >>
> >> [*] An RDF dataset is a set { DG, (<u_i>, G_i)} -- write it same as
> >> query has it, not "DG' union {(iri'j, G'j) | 1 <= j <= m})"
> >>
> >> [**] Not merge - this must be a union. not rename blank nodes apart.
> >> Otherwise one operation followed by another will not update the same
> >> bNode. And datseta-diff is not going to work.
> >>
> >> == 4.2.2 Dataset-DIFF
> >>
> >> [*] dataset comment as dataset-union.
> >> [**] Its says "merge" (bullet 3). Should be set-difference or minus.
> >> [] G_j should be G sub j.
> >>
> >> == 4.3.1 Insert Data Operation
> >>
> >> """
> >> graph_triples, i.e. either a dataset consisting of a single named graph
> >> and an empty default graph
> >> """
> >> [**] As we have defined dataset-union, I think this should be dataset
> >> union, nor limited to one graph. See also the graph_triples issue above.
> >>
> >> == 4.3.2 Delete Data Operation
> >>
> >> [**] graph_triples
> >>
> >> == 4.3.3 Delete Insert Operation
> >>
> >> """
> >> Triples are identified as they match a particular Group Graph Pattern P.
> >> """
> >> [**] The triples here are the ones to be deleted or inserted - they are
> >> not identified by matching - there is a template stage in between.
> >>
> >> [**] Define modify_template sub DEL and modify_template sub INS
> >>
> >> [**] Dataset(modify_template, P)
> >>
> >> Write this out formally:
> >>
> >> Dataset(modify_template, P) = { instantiate(modify_template) | μ a
> >> solution of P }
> >>
> >> instantiate(modify_template) = ....
> >>
> >>
> >> These are superseded if there is an abbreviated forms section:
> >> == 4.3.4 Delete Operation
> >> == 4.3.5 Insert Operation
> >> == 4.3.6 Delete Where Operation
> >>
> >>
> >> == 4.3.7 Insert Where Operation
> >> [**] What's this used with?
> >> "Insert Where ... are *deleted* from the Graph Store"
> >>
> >> == 4.4.1 Create Operation
> >>
> >> [*] Either something on what happens about empty graphs or, in the
> >> section intro, say the definitions assume we can have empty graphs. the
> >> latter is probably better.
> >>
> >>
> >> == 5 Conformance
> >> [*] remove / update name to "RDF Dataset HTTP Protocol"
> >>
> >>
> >
> 

Received on Wednesday, 16 March 2011 21:20:45 UTC