Re: [Graphs] Proposal for Named Graph Semantics

* Alex Hall <alexhall@revelytix.com> [2011-04-07 04:47-0400]
> Here is a proposal of a semantics for named graphs in RDF.  My goals here
> are to:
> - Extend http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal to
> go beyond (IRI, graph) tuples.
> - Give something that is formally defined enough to serve as a starting
> point for discussion.
> - Specify common semantics for multi-graph serialization formats, or at
> least a starting point.
> - Specify something that is flexible enough to satisfy applications that
> want to treat named graphs as either g-snaps or g-boxes.
> 
> Regarding g-boxes, I specifically want to avoid incorporating anything that
> suggests time variance into the semantics, because specifying semantics for
> temporal changes is explicitly out of scope for the existing RDF Semantics
> document.
> 
> Here goes...
> 
> 1. Graph Identification
> Let I be an IRI.  Define Graph(I) as a unary predicate such that Graph(I)
> implies that the resource identified by I is an RDF graph.  If desired, this
> can be described easily enough in RDF by defining a new class rdfs:Graph and
> mapping Graph(I) to the triple I rdf:type rdfs:Graph.

Perhaps you didn't intend to take a position on this, but does
"resource identified by I" imply http dererencing? I can think of
three interpretations of Graph:

  0a. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ RDFParse(HTTPGet(I)) == G
  0b. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ m(I) == G, m a local map
  0c. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ M(I) == G, M an RDF-wide map

Of course these definitions presume a position on whether two graphs
with the same triples == each other. Perhaps "I is an RDF graph" says
to use "is" and not "==".


> Define G(I) as a function that returns the RDF graph identified by I.  In
> our parlance, G(I) is a g-snap, invariant over time.  Due to the nature of
> RDF, it is difficult to express the relationship between I and G(I) natively
> in RDF.  Graph literals, which I understand to be the encoding of some set
> of triples as a single node in a graph, are one possible approach but this
> proposal does not attempt to define graph literals.  Furthermore, in the
> open world it's not possible to have complete knowledge of all the triples
> in G(I) for any given I.

In order to make explicit the difference between the mapping and the
graph name convention below, I'm using "GraphAt" for "G":

  1a. GraphAt(I:IRI):RDFGraph = G ∣ RDFParse(HTTPGet(I)) == G
  1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map
  1c. GraphAt(I:IRI):Boolean = G ∣ M(I) == G, M an RDF-wide map

and define 0 in terms of 1:

  0.  Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) == G


> 2. Graph Assertion
> Let I be an IRI and G be an RDF graph.  Define GA(I, G) as a binary
> predicate such that GA(I, G) implies (a) Graph(I) and (b) G(I) entails G.

  3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ Graph(I) && GraphAt(I) ⊢ G

or

  3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) ⊢ G


> The notion of graph assertion attempts to capture the semantics of what
> happens when some set of triples is associated with a graph IRI in a
> multi-graph serialization such as TriG.  So the TriG fragment:
> 
> :G1 { :a :b :c } .
> 
> would be understood to construct a graph G with a single triple :a :b :c and
> then make the assertion GA(:G1 G).
> 
> The use of "entails" as opposed to "equals" here is what gives us our
> flexibility.  Applications that want to treat named graphs as g-snaps,
> completely described by the triples associated with the graph IRI, can do so
> by extending (b) to say G(I) equals G instead of entails.  Because every
> graph entails itself, this extension is supported by these semantics, but
> this would not be required behavior.  Indeed, this could lead to trouble in
> the open world where you can have GA(I, G1) and GA(I, G2) with G1 != G2.

This might be awkward for tests cases 'cause
  <X> log:implies { <pi> <numericValue> 3.14, 3.0 . } . # the Bolslough simplification
is a valid parse of
  { <pi> <numericValue> 3.14 . }
I wonder if there's some way to move this beyond parsing (no suggestions yet).


> Applications that want to treat named graphs as g-boxes would to so by
> essentially maintaining a (time-sensitive) mapping of IRI I to graph G.
>  This aligns pretty closely with my understanding of the notion of graph
> store from SPARQL 1.1 Update.  Poking the g-box to obtain content (either a
> g-text serialization or query results) amounts to asserting GA(I, G) for the
> current value of G at some point in time.  Given a new graph assertion for
> an IRI that is already mapped in the store, an implementation could replace
> the currently mapped graph with the new one (effectively discarding all
> prior graph assertions) or merge them at its discretion; either approach
> would be supported by these semantics.

I guess this argues for 
  1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map
where we don't try to tell one person's GA to match another's.


> Any vocabulary for specifying graph literals and attaching them to a graph
> IRI in RDF would be defined as making a graph assertion, not setting the
> value of the identified graph.
> 
> 3. RDF Datasets
> I haven't thought this part through entirely, but I think these semantics
> could be aligned with the existing notion of RDF datasets from SPARQL (and
> as proposed on the wiki) by simply mapping the (IRI, graph) tuples in the
> dataset to the appropriate graph assertions.
> 
> 4. Graph Equality
> Because it is not the case that (G1 entails G and G2 entails G) implies G1 =
> G2, it is also not the case that (GA(I1, G) and GA(I2, G)) implies I1 and I2
> are the same graph.  Such a conclusion could be reached if you extend the
> definition of GA to mean equals instead of entails as discussed before, but
> again that is an extension and not part of the proposed semantics.
> 
> 5. Empty Graphs
> Because every graph trivially entails the empty graph E, the assertion GA(I,
> E) is trivially true for every graph IRI I.  Making that assertion doesn't
> do anything beyond identify the resource denoted by I as a graph.
> 
> 6. Graph Merges
> It follows from the definition of GA (and the definition of entails) that
> (GA(I, G1) and GA(I, G2)) implies GA(I, Merge(G1, G2)).  I think this gives
> us a pretty straightforward approach to merging of RDF datasets if this is
> required of the spec.
> 
> Hope you find this useful...  or at least that this stirs up some
> interesting debate.

thanks for moving this forward.


> Regards,
> Alex

-- 
-ericP

Received on Thursday, 7 April 2011 14:28:50 UTC