Re: Mutability and graphs [was: Re: page about the term "named graphs"] from Axel Polleres on 2010-07-23 (public-rdf-dawg@w3.org from July to September 2010)

From: Axel Polleres <axel.polleres@deri.org>
Date: Fri, 23 Jul 2010 10:42:09 +0100
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: "Sandro Hawke" <sandro@w3.org>, "SPARQL Working Group" <public-rdf-dawg@w3.org>
Message-Id: <C6072A07-1FF9-4E2D-BC23-4D2F65751CDD@deri.org>
Thanks Andy, that clarifies most of the confusion... very useful!

On 22 Jul 2010, at 14:08, Andy Seaborne wrote:

> This gets to a point we do need to be clear about for SPARQL 1.1 Update.
> 
> I agree with Sandro that some of the language of change and graph is not
> exact.
> 
> On 22/07/2010 9:41 AM, Axel Polleres wrote:
> > As for Graph,
> > i always understood simply:
> >
> >     Graph3 = a set of triples
> >
> > a set*can*  change, you can add triples, you can remove triples... is that the same as your graph1 or something different?
> 
> If we mean mathematical set, then no, it can't change.  You don't add an
> element to a set, you form the union of the set and a singleton set.

ok, indeed that's more precise (in fact, I kind of deliberately removed the adjective "mathematical" 
from sandro's definition G1, but I didn't know how to call it... your name 
"set-as-a-data-structure" captures Graph3 better, I think)

> In
>    "Set S1 = the set S0 set-union {x}"
> you are saying what the symbol S1 refers to in this context.
> 
> It is the set whose members are the members of (the set referred to by)
> S0, and the object x.  S1 may refer to the same set as S0 (i.e. x is in
> S0) or a different set.  Mathematical sets exists [+], just like numbers
> exist.

> If you mean set, as in data structure, then many programming languages
> do allow such a data structure to change. That's to do with slots,
> references and values.  A slot is a container of one value: you can
> change the value in a slot.  Deferencing a slot returns it's current value.

Ok, but then isn't set-as-a-data-structure then closer to what we need or what people
implement? The problem is in the mathematical sense I can't say something like

 S1 := S1 union {x}

whereas in the data-structure sense, I can.

> The data structure has mutable components and adding an element mutates
> some of those components by replacing contents of some slots with
> different values. The set-as-data-structure's value changes a well (in
> most programming languages including Java.  See dire warnings about
> changing the contents of objects held in a Set or Map).
> 
> An RDF graph is a mathematical set.
> 
> An "RDF dataset" as defined by SPARQL 1.0 is a mathematical-set of one
> graph (the default graph) and zero or more pairs (IRI, graph).  Those
> graphs are RDF graphs - values, immutable - not containers of triples.
> 
> http://www.w3.org/TR/rdf-sparql-query/#sparqlDataset
> 
> In SPARQL 1.1 Update, we have some places where the graph language is
> strictly sloppy but the long form is very long, communicates badly and
> does not reflect common understanding.  Any text is a balancing act.
> 
> The sloppyness is to do with values and references (or values and slots,
> if you like).  Some of the confusion on named graphs comes from this but
> it isn't specific to named graphs.
> 
> In the SPARQL Update submission [*], it's a bit clearer for this
> point-of-view as we have:
> 
> INSERT DATA INTO <g> { :s :p :o }
> 
> so there is slot <g> that has a graph (value) in it and then we perform:
> 
> S0 := graph in slot <g>
> S1 := S0 set-union { (:s :p :o) }
> value of slot <g> := S1
> 
> Note what happens if the same graph (value) is also in slot <g1>.
> 
> This is why I prefer "graph store" in update to the alternative of
> reusing the term "RDF dataset". The choice of language "graph store"
> captures for me the idea of a place where there are a number of slots
> for graphs.

Ok, I understand. Question... How much pain would it be to adopt the
set-as-datastructure meaning in our definition of dataset, i.e. could we change 
the definitions of dataset from SPARQL1.0? Would it cause upwards compatibility 
problems for query? Would it cause problems for Entailment? It seems that if we 
could do that, we'd also be able to use the same definition for dataset and 
graphstore (the distinction being something I don't really like, but, given the 
current definition of dataset seeming indeed to be inevitable)

Probably the answers to these questions are negative, i.e. we shouldn't mess with the
meaning/definition of "set" as it might cause confusion...

So one alternative way to proceed seems to me the following (sticking with the 
mathematical set definition, but also involving redefining "Dataset" as we 
had it in SPARQL1.0, which avoids the indirection of defining named graphs 
as pairs instead of graphs:

------------------------------------------------------------------------------
Dataset(new): Given RDF graphs (referred to by symbols)
<G_0>, .... ,<G_l>, <G_n1>, .... <G_nm> a dataset DS is the following set of 
(mathemetical) sets,

DS = {<G>, <G_n1>, .... , <G_nm>}

where G = { <G_0> merge .... merge <G_l> }

and merge is RDF merge in the sense of rdf-mt.
We also denote the symbol referring to a graph as "name" and particularly 
call the graph <G> "default graph".
------------------------------------------------------------------------------

Note that this is no longer "named graphs" in the Carrol pair sense.

I think "Graphstore" could also use this definition of dataset, meaning that a graphstore
is defined by a sequence of datasets DS_0 to DS_n determined by a sequence of 
updates: Starting with DS_0 being the empty or initial dataset, 
the "current dataset" after the n-th update is simply DS_n.
 
The "virtual symbol" <g> in a graphstore, that is, the symbol we use in our UPDATE syntax, 
in fact is to be understood as the concrete symbol <g>_t (i.e. referring to the 
timed instances/snapshots of the graph "<g> at time t").

As for the semantics of an update, for 
example, say insert {x} into graph "<g> at time t", i.e.

 INSERT DATA {GRAPH <g> {x} }

just means

 DS_{t+1} = {<G>_{t+1}, <G_n1>_{t+1}, .... <g>_{t+1}... , <G_nm>_{t+1}}
 
 where

 <G>_{t+1} = <G>_t 
 <G_n1>_{t+1} = <G_n1>_t
 ...
 <g>_{t+1} = <g>_{t} union {x}
 ...
 <G_n1>_{t+1} = <G_n1>_t

Likewise or for an insertion of a new named graph "<gnew> at time t", i.e.

LOAD <documentURI> [ INTO GRAPH <gnew> ]
 
this means

 DS_{t+1} = DS_t union {<gnew>_t} 

and so on for other update operations...

Does that sound reasonable? I.e., do we really *need* the "named graphs as pairs" definition, or could we go for a more direct definition of dataset/named graphs?


Axel

> 
> Here's a raw definition more worded to reflect that view:
> 
> A graph store is a number of slots, one unnamed, others named, each slot
> holds a value that is an RDF graph.
> 
> The SPARQL 1.1 Update document says:
> [[
> A Graph Store is a repository of RDF graphs managed by a single service.
> Like an RDF Dataset
> operated on by SPARQL, a Graph Store contains one unnamed graph and zero
> or more named graphs.
> ]]
> 
> Repository isn't bad.  "contains" is a bit iffy but isn't actually wrong
> - the contents can change.  "Like" is loose.
> 
> If you "modify a graph in a graph store" you are changing the value in
> that slot.  Same slot, different value and it's not the same graph.  It
> would be more correct to say "change the graph in the repository for
> another graph" or some such language.
> 
> In SPARQL 1.1 Update, we have the more convenient:
> 
> INSERT DATA { GRAPH <g> { :s :p :o } GRAPH <g1> { :s :p :o } }
> 
> so <g> and <g1> are names for slots in the graph store.
> 
> # Remove the slot.
> DROP GRAPH <g> 
> 
> # Retrieve and parse the document, set slot <g> to
> # the value so obtained.               
> LOAD <doc> INTO GRAPH <g>
> 
> # Set the slot to the empty graph.
> CLEAR GRAPH <g>                
> 
> # Create the slot and fill it with the empty graph.
> CREATE GRAPH <g>
> 
> ADD, COPY, MOVE are all slot operations within the repository.
> 
> 
> I hope I've got the language sorted out - apologies for mistakes, of
> which I'm sure there are some.  I avoided using "denotes".  There are
> many different alternative choices of words and ones background colours
> the preferred choice.
> 
> We have touched on a theory for update and are scheduling a telecon for
> it. (It's Fri 30th July 4pm UK time, if you missed the decision this
> week).  I hope this helps a little.
> 
>         Andy
> 
> [+] Well, except for the set of all sets and set of sets that are not
> members of themselves etc etc.
> [*] http://www.w3.org/Submission/SPARQL-Update/
>
Received on Friday, 23 July 2010 09:42:40 UTC