Mutability and graphs [was: Re: page about the term "named graphs"] from Andy Seaborne on 2010-07-22 (public-rdf-dawg@w3.org from July to September 2010)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Thu, 22 Jul 2010 14:08:29 +0100
To: Axel Polleres <axel.polleres@deri.org>
CC: Sandro Hawke <sandro@w3.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4C4842CD.80101@epimorphics.com>
This gets to a point we do need to be clear about for SPARQL 1.1 Update.

I agree with Sandro that some of the language of change and graph is not 
exact.

On 22/07/2010 9:41 AM, Axel Polleres wrote:
> As for Graph,
> i always understood simply:
>
>     Graph3 = a set of triples
>
> a set*can*  change, you can add triples, you can remove triples... is that the same as your graph1 or something different?

If we mean mathematical set, then no, it can't change.  You don't add an 
element to a set, you form the union of the set and a singleton set.

In
   "Set S1 = the set S0 set-union {x}"
you are saying what the symbol S1 refers to in this context.

It is the set whose members are the members of (the set referred to by) 
S0, and the object x.  S1 may refer to the same set as S0 (i.e. x is in 
S0) or a different set.  Mathematical sets exists [+], just like numbers 
exist.

If you mean set, as in data structure, then many programming languages 
do allow such a data structure to change.  That's to do with slots, 
references and values.  A slot is a container of one value: you can 
change the value in a slot.  Deferencing a slot returns it's current value.

The data structure has mutable components and adding an element mutates 
some of those components by replacing contents of some slots with 
different values. The set-as-data-structure's value changes a well (in 
most programming languages including Java.  See dire warnings about 
changing the contents of objects held in a Set or Map).

An RDF graph is a mathematical set.

An "RDF dataset" as defined by SPARQL 1.0 is a mathematical-set of one 
graph (the default graph) and zero or more pairs (IRI, graph).  Those 
graphs are RDF graphs - values, immutable - not containers of triples.

http://www.w3.org/TR/rdf-sparql-query/#sparqlDataset


In SPARQL 1.1 Update, we have some places where the graph language is 
strictly sloppy but the long form is very long, communicates badly and 
does not reflect common understanding.  Any text is a balancing act.

The sloppyness is to do with values and references (or values and slots, 
if you like).  Some of the confusion on named graphs comes from this but 
it isn't specific to named graphs.

In the SPARQL Update submission [*], it's a bit clearer for this 
point-of-view as we have:

INSERT DATA INTO <g> { :s :p :o }

so there is slot <g> that has a graph (value) in it and then we perform:

S0 := graph in slot <g>
S1 := S0 set-union { (:s :p :o) }
value of slot <g> := S1

Note what happens if the same graph (value) is also in slot <g1>.

This is why I prefer "graph store" in update to the alternative of 
reusing the term "RDF dataset".  The choice of language "graph store" 
captures for me the idea of a place where there are a number of slots 
for graphs.

Here's a raw definition more worded to reflect that view:

A graph store is a number of slots, one unnamed, others named, each slot 
holds a value that is an RDF graph.

The SPARQL 1.1 Update document says:
[[
A Graph Store is a repository of RDF graphs managed by a single service. 
Like an RDF Dataset
operated on by SPARQL, a Graph Store contains one unnamed graph and zero 
or more named graphs.
]]

Repository isn't bad.  "contains" is a bit iffy but isn't actually wrong 
- the contents can change.  "Like" is loose.

If you "modify a graph in a graph store" you are changing the value in 
that slot.  Same slot, different value and it's not the same graph.  It 
would be more correct to say "change the graph in the repository for 
another graph" or some such language.

In SPARQL 1.1 Update, we have the more convenient:

INSERT DATA { GRAPH <g> { :s :p :o } GRAPH <g1> { :s :p :o } }

so <g> and <g1> are names for slots in the graph store.

# Remove the slot.
DROP GRAPH <g>  

# Retrieve and parse the document, set slot <g> to
# the value so obtained.  
LOAD <doc> INTO GRAPH <g>

# Set the slot to the empty graph.
CLEAR GRAPH <g>   

# Create the slot and fill it with the empty graph.
CREATE GRAPH <g>

ADD, COPY, MOVE are all slot operations within the repository.


I hope I've got the language sorted out - apologies for mistakes, of 
which I'm sure there are some.  I avoided using "denotes".  There are 
many different alternative choices of words and ones background colours 
the preferred choice.

We have touched on a theory for update and are scheduling a telecon for 
it. (It's Fri 30th July 4pm UK time, if you missed the decision this 
week).  I hope this helps a little.

 Andy

[+] Well, except for the set of all sets and set of sets that are not 
members of themselves etc etc.
[*] http://www.w3.org/Submission/SPARQL-Update/
Received on Thursday, 22 July 2010 13:29:57 UTC