[Graphs] Proposal for Named Graph Semantics from Alex Hall on 2011-04-07 (public-rdf-wg@w3.org from April 2011)

From: Alex Hall <alexhall@revelytix.com>
Date: Thu, 7 Apr 2011 04:47:03 -0400
To: RDF WG <public-rdf-wg@w3.org>
Message-ID: <BANLkTinFC6F1EoR=VVhynTazthGC8TnvKA@mail.gmail.com>
Here is a proposal of a semantics for named graphs in RDF.  My goals here
are to:
- Extend http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal to
go beyond (IRI, graph) tuples.
- Give something that is formally defined enough to serve as a starting
point for discussion.
- Specify common semantics for multi-graph serialization formats, or at
least a starting point.
- Specify something that is flexible enough to satisfy applications that
want to treat named graphs as either g-snaps or g-boxes.

Regarding g-boxes, I specifically want to avoid incorporating anything that
suggests time variance into the semantics, because specifying semantics for
temporal changes is explicitly out of scope for the existing RDF Semantics
document.

Here goes...

1. Graph Identification
Let I be an IRI.  Define Graph(I) as a unary predicate such that Graph(I)
implies that the resource identified by I is an RDF graph.  If desired, this
can be described easily enough in RDF by defining a new class rdfs:Graph and
mapping Graph(I) to the triple I rdf:type rdfs:Graph.

Define G(I) as a function that returns the RDF graph identified by I.  In
our parlance, G(I) is a g-snap, invariant over time.  Due to the nature of
RDF, it is difficult to express the relationship between I and G(I) natively
in RDF.  Graph literals, which I understand to be the encoding of some set
of triples as a single node in a graph, are one possible approach but this
proposal does not attempt to define graph literals.  Furthermore, in the
open world it's not possible to have complete knowledge of all the triples
in G(I) for any given I.

2. Graph Assertion
Let I be an IRI and G be an RDF graph.  Define GA(I, G) as a binary
predicate such that GA(I, G) implies (a) Graph(I) and (b) G(I) entails G.

The notion of graph assertion attempts to capture the semantics of what
happens when some set of triples is associated with a graph IRI in a
multi-graph serialization such as TriG.  So the TriG fragment:

:G1 { :a :b :c } .

would be understood to construct a graph G with a single triple :a :b :c and
then make the assertion GA(:G1 G).

The use of "entails" as opposed to "equals" here is what gives us our
flexibility.  Applications that want to treat named graphs as g-snaps,
completely described by the triples associated with the graph IRI, can do so
by extending (b) to say G(I) equals G instead of entails.  Because every
graph entails itself, this extension is supported by these semantics, but
this would not be required behavior.  Indeed, this could lead to trouble in
the open world where you can have GA(I, G1) and GA(I, G2) with G1 != G2.

Applications that want to treat named graphs as g-boxes would to so by
essentially maintaining a (time-sensitive) mapping of IRI I to graph G.
 This aligns pretty closely with my understanding of the notion of graph
store from SPARQL 1.1 Update.  Poking the g-box to obtain content (either a
g-text serialization or query results) amounts to asserting GA(I, G) for the
current value of G at some point in time.  Given a new graph assertion for
an IRI that is already mapped in the store, an implementation could replace
the currently mapped graph with the new one (effectively discarding all
prior graph assertions) or merge them at its discretion; either approach
would be supported by these semantics.

Any vocabulary for specifying graph literals and attaching them to a graph
IRI in RDF would be defined as making a graph assertion, not setting the
value of the identified graph.

3. RDF Datasets
I haven't thought this part through entirely, but I think these semantics
could be aligned with the existing notion of RDF datasets from SPARQL (and
as proposed on the wiki) by simply mapping the (IRI, graph) tuples in the
dataset to the appropriate graph assertions.

4. Graph Equality
Because it is not the case that (G1 entails G and G2 entails G) implies G1 =
G2, it is also not the case that (GA(I1, G) and GA(I2, G)) implies I1 and I2
are the same graph.  Such a conclusion could be reached if you extend the
definition of GA to mean equals instead of entails as discussed before, but
again that is an extension and not part of the proposed semantics.

5. Empty Graphs
Because every graph trivially entails the empty graph E, the assertion GA(I,
E) is trivially true for every graph IRI I.  Making that assertion doesn't
do anything beyond identify the resource denoted by I as a graph.

6. Graph Merges
It follows from the definition of GA (and the definition of entails) that
(GA(I, G1) and GA(I, G2)) implies GA(I, Merge(G1, G2)).  I think this gives
us a pretty straightforward approach to merging of RDF datasets if this is
required of the spec.

Hope you find this useful...  or at least that this stirs up some
interesting debate.

Regards,
Alex
Received on Thursday, 7 April 2011 08:47:32 UTC