Re: [Graphs] Proposal: RDF Datasets from Antoine Zimmermann on 2011-03-08 (public-rdf-wg@w3.org from March 2011)

From: Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>
Date: Tue, 08 Mar 2011 15:17:31 +0100
To: Richard Cyganiak <richard@cyganiak.de>
CC: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4D763A7B.8080203@insa-lyon.fr>
Richard,


Good starting point.

I am in favour of using the notion of dataset from SPARQL but I have a 
problem with the semantics. You say:

"The interpretation of an RDF Dataset is that of the union of its 
constituent graphs."

One of the strong reasons to keep information about provenance is to 
avoid spreading inconsistencies everywhere. Separating statements in 
distinct boxes should avoid knowledge from disjoint contexts to intertwine.

Besides, in a semantic web search engine which index all RDF data on the 
web (like Sindice, SWSE) this is not acceptable. Neither Sindice nor 
SWSE implement the semantics you propose, which is unfortunate since you 
advocate following deployed application practices and those are among 
high-profile applications from your own institute.

What you define is a semantics which maximises the "permeability" of 
contexts, that is, every triples defined in any graphs influence equally 
the knowledge from any other graph within a dataset.

On the contrary, we could argue in favour of a semantics that minimises 
the permeability of contexts, that is, a triple in a graph can only have 
an impact on the knowledge of that graph.

This can be formalised as follows:

"The interpretation of an RDF Dataset (G, (id1,G1), ..., (idn,Gn)) is a 
tuple (I, I1, ..., In) where I is an RDF-interpretation of G and for all 
1 <= i <= n, Ii is an RDF-interpretation of Gi."

This way, you prevent the knowledge of a graph from perturbing the 
knowledge of other graphs, thereby complying very well with 
heterogeneous and unreliable information from all over the Web.

Unfortunately, this is not ideal because it is often desired that 
knowledge actually "flows" across contexts. There are several proposed 
formalisms that lie in between the two extremes defined above (viz., 
maximal and minimal permeability) but this is not the goal of this WG to 
choose or define one. However, it would be good if the semantics of 
datasets was as generic and permissive as possible, such that extensions 
of it can constrain it further (just like the semantics of RDF is itself 
very permissive but further constrained by RDFS, OWL, SWRL, etc). In 
this sense, the "minimal permeability" semantics is the most permissive. 
To constrain it, it suffices to add vocabularies that specify the way 
knowledge from graphs interact. For instance:

:G1 ex:imports :G2 .

could be a way to ensure that the interpretation of :G1 has to satisfy 
both :G1 and :G2. If all graphs import each others, then an 
interpretation of a dataset becomes equivalent to an RDF-interpretation 
of the union of its constituent graphs, which is exactly the "maximal 
permeability" semantics that you defined.

The reasoning formalisms used by Sindice or SWSE (and certainly other 
triple stores with reasoning capabilities) would fit well with this 
approach. Annotated RDF(S) would also work as a semantic extension of 
this generic approach (with appropriate vocabularies).

I'll put this proposal somewhere on the wiki with more technical details.


Regards,
AZ.



Le 08/03/2011 12:29, Richard Cyganiak a écrit :
> All,
>
> I wrote up a proposal for addressing the [Graphs] work item:
> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal
>
> The gist is to simply lift the definition of RDF Datasets from SPARQL into RDF Concepts.
>
> I believe that this is the simplest thing we could possibly do in order to fulfill the work item from the charter, and addresses the use cases that were brought forward.
>
> This is intended as a starting point for discussion. In particular I'd like to see:
>
> - arguments that this doesn't address (or poorly addresses) the use cases
> - arguments that this doesn't meet the charter requirements
> - improvements to the proposal that would help to better address the use cases
> - counter-proposals in a similar style
>
> Best,
> Richard


-- 
Antoine Zimmermann
Researcher at:
Laboratoire d'InfoRmatique en Image et Systèmes d'information
Database Group
7 Avenue Jean Capelle
69621 Villeurbanne Cedex
France
Lecturer at:
Institut National des Sciences Appliquées de Lyon
20 Avenue Albert Einstein
69621 Villeurbanne Cedex
France
antoine.zimmermann@insa-lyon.fr
http://zimmer.aprilfoolsreview.com/
Received on Tuesday, 8 March 2011 14:18:06 UTC