Re: [Graphs] Proposal for Named Graph Semantics from Nathan on 2011-04-07 (public-rdf-wg@w3.org from April 2011)

From: Nathan <nathan@webr3.org>
Date: Thu, 07 Apr 2011 22:06:04 +0100
To: Eric Prud'hommeaux <eric@w3.org>
CC: Alex Hall <alexhall@revelytix.com>, RDF WG <public-rdf-wg@w3.org>, Richard Cyganiak <richard@cyganiak.de>
Message-ID: <4D9E273C.2040300@webr3.org>
Eric Prud'hommeaux wrote:
> * Alex Hall <alexhall@revelytix.com> [2011-04-07 04:47-0400]
>> Here is a proposal of a semantics for named graphs in RDF.  My goals here
>> are to:
>> - Extend http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal to
>> go beyond (IRI, graph) tuples.
>> - Give something that is formally defined enough to serve as a starting
>> point for discussion.
>> - Specify common semantics for multi-graph serialization formats, or at
>> least a starting point.
>> - Specify something that is flexible enough to satisfy applications that
>> want to treat named graphs as either g-snaps or g-boxes.
>>
>> Regarding g-boxes, I specifically want to avoid incorporating anything that
>> suggests time variance into the semantics, because specifying semantics for
>> temporal changes is explicitly out of scope for the existing RDF Semantics
>> document.
>>
>> Here goes...
>>
>> 1. Graph Identification
>> Let I be an IRI.  Define Graph(I) as a unary predicate such that Graph(I)
>> implies that the resource identified by I is an RDF graph.  If desired, this
>> can be described easily enough in RDF by defining a new class rdfs:Graph and
>> mapping Graph(I) to the triple I rdf:type rdfs:Graph.
> 
> Perhaps you didn't intend to take a position on this, but does
> "resource identified by I" imply http dererencing? I can think of
> three interpretations of Graph:
> 
>   0a. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ RDFParse(HTTPGet(I)) == G
>   0b. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ m(I) == G, m a local map
>   0c. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ M(I) == G, M an RDF-wide map
> 
> Of course these definitions presume a position on whether two graphs
> with the same triples == each other. Perhaps "I is an RDF graph" says
> to use "is" and not "==".

This is pretty much just like REST, and a bit unsurprising since we 
stuck RDF on the web, started to dereference it, put the results in a 
quad store where the IRI of resource we dereferenced was said to be the 
graph identifier and the triples extracted from the representation were 
put in the remaining 3 slots. Quads born, SPARQL, QuadStores, RDF 
Datasets, full circle and back to here and now.

   RDFParse(HTTPGet(I)) == G

doesn't entail that I is a Graph or that I == G, since:

*- I is mapped to (or associated with) a set of values over time
- a successful HTTPGet(I) returns a single value from the set (not the 
current value)
- the values in the set are representations
- each representation may have multiple properties and contain multiple 
things (RDF triples being just one of those things, like in RDFa documents)
- RDFParse(HTTPGet(I)) == a g-snap.

pretty much the most we can conclude, regardless of the interface (HTTP 
or otherwise), is that the g-snap (representing a set of triples) is 
associated with I.

if we say any more than that, we're just creating use case specific 
stories to aid comprehension for people, but at the same time we're 
creating artificial limitations which will inevitably come back to bite 
us, or somebody, in the ass later.

side: I put the * beside the first item so that you can see the REST and 
HTTP story falls apart when they try to speak of anything behind the 
interface, because instinct and context here makes you think that the 
values which I is associated with over time are rdf graphs / sets of 
triples; and the context in HTTP/REST makes you think it's associated 
with a set of representations over time, or some resource that has a 
mythical state. Any attempt to say what is behind the interface or what 
the relationship is just makes problems later when you zoom out and 
consider other concepts. So IMHO, just best to keep it as simple as 
possible.

   if RDFParse(HTTPGet(<I>)) results in a g-snap Z, then:
    g-snap Z is associated with <I>
    <I> is dereferencable by HTTP
    an HTTP GET on <I> at time X resulted in representation Y
    representation Y is associated with <I>
    Y contains Z

Best,

Nathan

>> Define G(I) as a function that returns the RDF graph identified by I.  In
>> our parlance, G(I) is a g-snap, invariant over time.  Due to the nature of
>> RDF, it is difficult to express the relationship between I and G(I) natively
>> in RDF.  Graph literals, which I understand to be the encoding of some set
>> of triples as a single node in a graph, are one possible approach but this
>> proposal does not attempt to define graph literals.  Furthermore, in the
>> open world it's not possible to have complete knowledge of all the triples
>> in G(I) for any given I.
> 
> In order to make explicit the difference between the mapping and the
> graph name convention below, I'm using "GraphAt" for "G":
> 
>   1a. GraphAt(I:IRI):RDFGraph = G ∣ RDFParse(HTTPGet(I)) == G
>   1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map
>   1c. GraphAt(I:IRI):Boolean = G ∣ M(I) == G, M an RDF-wide map
> 
> and define 0 in terms of 1:
> 
>   0.  Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) == G
> 
> 
>> 2. Graph Assertion
>> Let I be an IRI and G be an RDF graph.  Define GA(I, G) as a binary
>> predicate such that GA(I, G) implies (a) Graph(I) and (b) G(I) entails G.
> 
>   3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ Graph(I) && GraphAt(I) ⊢ G
> 
> or
> 
>   3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) ⊢ G
> 
> 
>> The notion of graph assertion attempts to capture the semantics of what
>> happens when some set of triples is associated with a graph IRI in a
>> multi-graph serialization such as TriG.  So the TriG fragment:
>>
>> :G1 { :a :b :c } .
>>
>> would be understood to construct a graph G with a single triple :a :b :c and
>> then make the assertion GA(:G1 G).
>>
>> The use of "entails" as opposed to "equals" here is what gives us our
>> flexibility.  Applications that want to treat named graphs as g-snaps,
>> completely described by the triples associated with the graph IRI, can do so
>> by extending (b) to say G(I) equals G instead of entails.  Because every
>> graph entails itself, this extension is supported by these semantics, but
>> this would not be required behavior.  Indeed, this could lead to trouble in
>> the open world where you can have GA(I, G1) and GA(I, G2) with G1 != G2.
> 
> This might be awkward for tests cases 'cause
>   <X> log:implies { <pi> <numericValue> 3.14, 3.0 . } . # the Bolslough simplification
> is a valid parse of
>   { <pi> <numericValue> 3.14 . }
> I wonder if there's some way to move this beyond parsing (no suggestions yet).
> 
> 
>> Applications that want to treat named graphs as g-boxes would to so by
>> essentially maintaining a (time-sensitive) mapping of IRI I to graph G.
>>  This aligns pretty closely with my understanding of the notion of graph
>> store from SPARQL 1.1 Update.  Poking the g-box to obtain content (either a
>> g-text serialization or query results) amounts to asserting GA(I, G) for the
>> current value of G at some point in time.  Given a new graph assertion for
>> an IRI that is already mapped in the store, an implementation could replace
>> the currently mapped graph with the new one (effectively discarding all
>> prior graph assertions) or merge them at its discretion; either approach
>> would be supported by these semantics.
> 
> I guess this argues for 
>   1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map
> where we don't try to tell one person's GA to match another's.
> 
> 
>> Any vocabulary for specifying graph literals and attaching them to a graph
>> IRI in RDF would be defined as making a graph assertion, not setting the
>> value of the identified graph.
>>
>> 3. RDF Datasets
>> I haven't thought this part through entirely, but I think these semantics
>> could be aligned with the existing notion of RDF datasets from SPARQL (and
>> as proposed on the wiki) by simply mapping the (IRI, graph) tuples in the
>> dataset to the appropriate graph assertions.
>>
>> 4. Graph Equality
>> Because it is not the case that (G1 entails G and G2 entails G) implies G1 =
>> G2, it is also not the case that (GA(I1, G) and GA(I2, G)) implies I1 and I2
>> are the same graph.  Such a conclusion could be reached if you extend the
>> definition of GA to mean equals instead of entails as discussed before, but
>> again that is an extension and not part of the proposed semantics.
>>
>> 5. Empty Graphs
>> Because every graph trivially entails the empty graph E, the assertion GA(I,
>> E) is trivially true for every graph IRI I.  Making that assertion doesn't
>> do anything beyond identify the resource denoted by I as a graph.
>>
>> 6. Graph Merges
>> It follows from the definition of GA (and the definition of entails) that
>> (GA(I, G1) and GA(I, G2)) implies GA(I, Merge(G1, G2)).  I think this gives
>> us a pretty straightforward approach to merging of RDF datasets if this is
>> required of the spec.
>>
>> Hope you find this useful...  or at least that this stirs up some
>> interesting debate.
> 
> thanks for moving this forward.
> 
> 
>> Regards,
>> Alex
>
Received on Thursday, 7 April 2011 21:07:08 UTC