Re: [Graphs] Proposal for Named Graph Semantics from Antoine Zimmermann on 2011-04-08 (public-rdf-wg@w3.org from April 2011)

From: Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>
Date: Fri, 08 Apr 2011 17:32:52 +0200
To: Alex Hall <alexhall@revelytix.com>
CC: public-rdf-wg@w3.org
Message-ID: <4D9F2AA4.8050500@insa-lyon.fr>
Le 08/04/2011 16:48, Alex Hall a écrit :
> On Fri, Apr 8, 2011 at 10:22 AM, Antoine Zimmermann<
> antoine.zimmermann@insa-lyon.fr>  wrote:
>
>> I propose something that goes very much in the same direction as Alex's
>> proposal:
>>
>> Definition(Graph map) A graph map GM is a partial function from the set of
>> IRIs to the set of g-snaps.
>>
>> Definition(Temporal graph map) A temporal graph map TGM is a partial
>> function from [-inf,+inf] which maps a time point to a graph map.
>>
>> There exists a special temporal graph map, called the HTTPmap, such that at
>> any given point in time t, HTTPmap(t) maps a URI to the parsed RDF graph of
>> the document retrieved via an HTTP GET of the URI at the time t. (if HTTP
>> GET does not provide an RDF serialisation, then the mapping is not defined
>> on that URI).
>>
>> A graph map is application-specific and can be static or temporal. If an
>> application implements a versioning system, it is possible to use a temporal
>> graph map with a time parameter in the past. However, the default behaviour
>> should be to use a temporal graph map with the current time.
>>
>> Now, to implement a graph map different from the HTTPmap, one way is to use
>> a format like TriG or Quads where the mapping is syntactically made explicit
>> in a file:
>>
>> :G1 {:x :y :z}
>> :G2 {:a :b :c}
>>
>> may be used to say that GM(:G1) = {:x :y :z} and GM(:G2) = {:a :b :c}.
>>
>> Now, the connection with Dataset and its semantics is as follows: a set of
>> URIs (u1, ..., un) and an optional default g-snap G induce a dataset (G,
>> <u1,GM(u1)>, ... ,<un,GM(un)>).
>>
>> The rest of the semantics is as in the current Wikipage.
>>
>> Basically, this says that a dataset is a snapshot of the data you can get
>> by looking up certain URI ("looking up" not necessarily meaning "using
>> HTTP"). The semantics in the wikipage just says that each graph in the
>> dataset are interpreted independently (this can be further constrained by
>> semantic extensions, just like RDFS constrains further the RDF
>> interpretations).
>>
>
> I'd say this is a different direction from my proposal, the fundamental
> difference being that my graphs and graph map are invariant over time and
> that the presence of an<IRI,G>  pair in a dataset is making an assertion as
> to the (partial) content of the graph mapped by that IRI.  The reason I say
> partial is that in the open world, we can never assume to have a full
> description of any resource, and I extend that to include graphs named with
> IRIs.

The fact that a IRI maps to a specific closed set of triples does not 
contradict the OWA. The set of triple is what you have to load in your 
application (and you don't have to worry whether you are loading the 
graph or just a part of it), but it does not mean that the facts which 
are not in that set are not true. You can always integrate new facts by 
merging other graphs with a different name.

> You seem to be treating your map as a mutable data structure in some
> programming language, from which we can retrieve graphs.  I'm not saying
> that's wrong, and from a practical standpoint e.g. for answering queries,
> there's probably no difference.
>

What I say is that the map is application-dependant, just to leave this 
up to each implementer. No matter what's the implementation, it all ends 
to having a mathematical functional that maps a set of IRIs to g-snaps 
(possibly changing over time). This is what Pat called "poking a g-box". 
Poking a g-box is sometimes HTTP dereferencing the IRI that identifies 
the g-box. Sometimes, it's just returning what's in the curly brackets 
in a TriG document. Sometimes it's whatever triples attached to a 
certain IRI in a Quad file. Sometimes, it's getting the graph 
represented by a Jena Model in memory.

Being independent of these implementation details allows me to define 
formally a semantics without sanctioning one way over the others of 
poking g-boxes.

> Not trying to make a judgment here, just pointing out how they are different
> approaches.
 >
> -Alex
>
>
>
>>
>>
>> AZ.
>>
>> Le 07/04/2011 10:47, Alex Hall a écrit :
>>
>>   Here is a proposal of a semantics for named graphs in RDF.  My goals here
>>> are to:
>>> - Extend
>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal to
>>> go beyond (IRI, graph) tuples.
>>> - Give something that is formally defined enough to serve as a starting
>>> point for discussion.
>>> - Specify common semantics for multi-graph serialization formats, or at
>>> least a starting point.
>>> - Specify something that is flexible enough to satisfy applications that
>>> want to treat named graphs as either g-snaps or g-boxes.
>>>
>>> Regarding g-boxes, I specifically want to avoid incorporating anything
>>> that
>>> suggests time variance into the semantics, because specifying semantics
>>> for
>>> temporal changes is explicitly out of scope for the existing RDF Semantics
>>> document.
>>>
>>> Here goes...
>>>
>>> 1. Graph Identification
>>> Let I be an IRI.  Define Graph(I) as a unary predicate such that Graph(I)
>>> implies that the resource identified by I is an RDF graph.  If desired,
>>> this
>>> can be described easily enough in RDF by defining a new class rdfs:Graph
>>> and
>>> mapping Graph(I) to the triple I rdf:type rdfs:Graph.
>>>
>>> Define G(I) as a function that returns the RDF graph identified by I.  In
>>> our parlance, G(I) is a g-snap, invariant over time.  Due to the nature of
>>> RDF, it is difficult to express the relationship between I and G(I)
>>> natively
>>> in RDF.  Graph literals, which I understand to be the encoding of some set
>>> of triples as a single node in a graph, are one possible approach but this
>>> proposal does not attempt to define graph literals.  Furthermore, in the
>>> open world it's not possible to have complete knowledge of all the triples
>>> in G(I) for any given I.
>>>
>>> 2. Graph Assertion
>>> Let I be an IRI and G be an RDF graph.  Define GA(I, G) as a binary
>>> predicate such that GA(I, G) implies (a) Graph(I) and (b) G(I) entails G.
>>>
>>> The notion of graph assertion attempts to capture the semantics of what
>>> happens when some set of triples is associated with a graph IRI in a
>>> multi-graph serialization such as TriG.  So the TriG fragment:
>>>
>>> :G1 { :a :b :c } .
>>>
>>> would be understood to construct a graph G with a single triple :a :b :c
>>> and
>>> then make the assertion GA(:G1 G).
>>>
>>> The use of "entails" as opposed to "equals" here is what gives us our
>>> flexibility.  Applications that want to treat named graphs as g-snaps,
>>> completely described by the triples associated with the graph IRI, can do
>>> so
>>> by extending (b) to say G(I) equals G instead of entails.  Because every
>>> graph entails itself, this extension is supported by these semantics, but
>>> this would not be required behavior.  Indeed, this could lead to trouble
>>> in
>>> the open world where you can have GA(I, G1) and GA(I, G2) with G1 != G2.
>>>
>>> Applications that want to treat named graphs as g-boxes would to so by
>>> essentially maintaining a (time-sensitive) mapping of IRI I to graph G.
>>>   This aligns pretty closely with my understanding of the notion of graph
>>> store from SPARQL 1.1 Update.  Poking the g-box to obtain content (either
>>> a
>>> g-text serialization or query results) amounts to asserting GA(I, G) for
>>> the
>>> current value of G at some point in time.  Given a new graph assertion for
>>> an IRI that is already mapped in the store, an implementation could
>>> replace
>>> the currently mapped graph with the new one (effectively discarding all
>>> prior graph assertions) or merge them at its discretion; either approach
>>> would be supported by these semantics.
>>>
>>> Any vocabulary for specifying graph literals and attaching them to a graph
>>> IRI in RDF would be defined as making a graph assertion, not setting the
>>> value of the identified graph.
>>>
>>> 3. RDF Datasets
>>> I haven't thought this part through entirely, but I think these semantics
>>> could be aligned with the existing notion of RDF datasets from SPARQL (and
>>> as proposed on the wiki) by simply mapping the (IRI, graph) tuples in the
>>> dataset to the appropriate graph assertions.
>>>
>>> 4. Graph Equality
>>> Because it is not the case that (G1 entails G and G2 entails G) implies G1
>>> =
>>> G2, it is also not the case that (GA(I1, G) and GA(I2, G)) implies I1 and
>>> I2
>>> are the same graph.  Such a conclusion could be reached if you extend the
>>> definition of GA to mean equals instead of entails as discussed before,
>>> but
>>> again that is an extension and not part of the proposed semantics.
>>>
>>> 5. Empty Graphs
>>> Because every graph trivially entails the empty graph E, the assertion
>>> GA(I,
>>> E) is trivially true for every graph IRI I.  Making that assertion doesn't
>>> do anything beyond identify the resource denoted by I as a graph.
>>>
>>> 6. Graph Merges
>>> It follows from the definition of GA (and the definition of entails) that
>>> (GA(I, G1) and GA(I, G2)) implies GA(I, Merge(G1, G2)).  I think this
>>> gives
>>> us a pretty straightforward approach to merging of RDF datasets if this is
>>> required of the spec.
>>>
>>> Hope you find this useful...  or at least that this stirs up some
>>> interesting debate.
>>>
>>> Regards,
>>> Alex
>>>
>>>
>>
>> --
>> Antoine Zimmermann
>> Researcher at:
>> Laboratoire d'InfoRmatique en Image et Systèmes d'information
>> Database Group
>> 7 Avenue Jean Capelle
>> 69621 Villeurbanne Cedex
>> France
>> Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
>> Lecturer at:
>> Institut National des Sciences Appliquées de Lyon
>> 20 Avenue Albert Einstein
>> 69621 Villeurbanne Cedex
>> France
>> antoine.zimmermann@insa-lyon.fr
>> http://zimmer.aprilfoolsreview.com/
>>
>>
>


-- 
Antoine Zimmermann
Researcher at:
Laboratoire d'InfoRmatique en Image et Systèmes d'information
Database Group
7 Avenue Jean Capelle
69621 Villeurbanne Cedex
France
Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
Lecturer at:
Institut National des Sciences Appliquées de Lyon
20 Avenue Albert Einstein
69621 Villeurbanne Cedex
France
antoine.zimmermann@insa-lyon.fr
http://zimmer.aprilfoolsreview.com/
Received on Friday, 8 April 2011 15:33:22 UTC