Re: [Graphs] Proposal: RDF Datasets from Richard Cyganiak on 2011-03-08 (public-rdf-wg@w3.org from March 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 8 Mar 2011 19:58:09 +0000
To: Ivan Herman <ivan@w3.org>
Cc: antoine.zimmermann@insa-lyon.fr, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <A2309F22-86B7-494E-BC6A-E40C515C924F@cyganiak.de>
Ivan,

[Summary: A forceful statement that defining only an (id,G) tuple is not sufficient, we need to define some model for *multiple* graphs]

On 8 Mar 2011, at 16:09, Ivan Herman wrote:
>>> Going one step forward (to Richard) I am not that we have to define the concept of a dataset at that level (which is of course useful for SPARQL). I would expect to define only a (id,G) tuple in some sense; what is the reason of going beyond that?
>> 
>> It is in the charter and has been clearly requested in the community survey.
> 
> Hm. The charter is (intentionally) vague on this, I do not read as requiring to define the concepts of datasets.

I did not say that the charter requires the concept of a dataset.

You asked me: “What is the reason of going beyond only defining a (id,G) tuple.”

And I said, “Because it's in the charter.”

Here it is: “Standardize a model and semantics for multiple graphs and graphs stores”

An (id,G) pair is not a model for multiple graphs. Nor is it a model for graph stores. This concept, in isolation, is not useful for anything at all (except defining sets of them).

I concur that the charter doesn't require the RDF Datasets approach. There are other approaches that would address the requirement, such as graph literals (which I'd actually prefer over (id,G)), and I'm sure that others can be proposed.

> The survey results:
> 
> http://www.w3.org/2002/09/wbs/1/rdf-2010/results#xq15
> 
> are not that clear either.

I'm sorry?

When asked about the “most important addition to RDF,” “named graphs” was the single most frequently mentioned item. The feature that was rated as bringing most benefit, both for the community and for responders' organizations, was “Add Core Support for Working With Multiple Graphs”.

Please, how could the results have been any clearer?

>> Giving some standard name to that concept will help with consistency across specs and implementations.
> 
> But the concept of datasets is not absolutely necessary. Just like I do not know what a default graph is (I of course know what it means for a sparql endpoint, but that is another matter!). If we have the concept of a (<u>,G) for naming a, hm, named graph, g-box, or whatever, that is fine.

No that is not fine, because then any of the multi-graph syntaxes will have to define their own notion of “a set of (id,G) pairs”. Some will include a default graph, others won't. Some will allow blank nodes and/or literals as id's, others won't. Some will scope blank nodes to the file, others to the graph, others won't specify it at all because it's such a hard question.

This mess is exactly the situation we already have. Right now that's sort of ok, because all that stuff except for SPARQL currently lives in specs that were written by random community people, so we can't expect consistency. But I don't think that defining several W3C specs that all deal with some sort of collection of (id,G) pairs, without specifying in one place how these collections work, is a good idea.

> You and Antonie are arguing on the semantics of _datasets_; I am still not convinced that this discussion should happen in the first place!

According to our charter, we have to “Standardize a model and semantics for […] graphs stores.”

There's the model and then there's the semantics.

My personal view is that all we need is a data structure, and we don't need a model theory, so I actually find the “semantics” requirement from the charter rather inconvenient.

But a proper multi-graph model is needed, no doubt. Even if you argue that it's not *required* by the phrasing, it is certainly within scope, and I feel that the community has clearly demanded it, and they gave us a clear mandate to be working on this.

Richard




> 
> just my two cents...
> 
> Ivan  
> 
> 
> 
>> Best,
>> Richard
>> 
>> 
>> 
>>> 
>>> Ivan
>>> 
>>> On Mar 8, 2011, at 15:17 , Antoine Zimmermann wrote:
>>> 
>>>> Richard,
>>>> 
>>>> 
>>>> Good starting point.
>>>> 
>>>> I am in favour of using the notion of dataset from SPARQL but I have a problem with the semantics. You say:
>>>> 
>>>> "The interpretation of an RDF Dataset is that of the union of its constituent graphs."
>>>> 
>>>> One of the strong reasons to keep information about provenance is to avoid spreading inconsistencies everywhere. Separating statements in distinct boxes should avoid knowledge from disjoint contexts to intertwine.
>>>> 
>>>> Besides, in a semantic web search engine which index all RDF data on the web (like Sindice, SWSE) this is not acceptable. Neither Sindice nor SWSE implement the semantics you propose, which is unfortunate since you advocate following deployed application practices and those are among high-profile applications from your own institute.
>>>> 
>>>> What you define is a semantics which maximises the "permeability" of contexts, that is, every triples defined in any graphs influence equally the knowledge from any other graph within a dataset.
>>>> 
>>>> On the contrary, we could argue in favour of a semantics that minimises the permeability of contexts, that is, a triple in a graph can only have an impact on the knowledge of that graph.
>>>> 
>>>> This can be formalised as follows:
>>>> 
>>>> "The interpretation of an RDF Dataset (G, (id1,G1), ..., (idn,Gn)) is a tuple (I, I1, ..., In) where I is an RDF-interpretation of G and for all 1 <= i <= n, Ii is an RDF-interpretation of Gi."
>>>> 
>>>> This way, you prevent the knowledge of a graph from perturbing the knowledge of other graphs, thereby complying very well with heterogeneous and unreliable information from all over the Web.
>>>> 
>>>> Unfortunately, this is not ideal because it is often desired that knowledge actually "flows" across contexts. There are several proposed formalisms that lie in between the two extremes defined above (viz., maximal and minimal permeability) but this is not the goal of this WG to choose or define one. However, it would be good if the semantics of datasets was as generic and permissive as possible, such that extensions of it can constrain it further (just like the semantics of RDF is itself very permissive but further constrained by RDFS, OWL, SWRL, etc). In this sense, the "minimal permeability" semantics is the most permissive. To constrain it, it suffices to add vocabularies that specify the way knowledge from graphs interact. For instance:
>>>> 
>>>> :G1 ex:imports :G2 .
>>>> 
>>>> could be a way to ensure that the interpretation of :G1 has to satisfy both :G1 and :G2. If all graphs import each others, then an interpretation of a dataset becomes equivalent to an RDF-interpretation of the union of its constituent graphs, which is exactly the "maximal permeability" semantics that you defined.
>>>> 
>>>> The reasoning formalisms used by Sindice or SWSE (and certainly other triple stores with reasoning capabilities) would fit well with this approach. Annotated RDF(S) would also work as a semantic extension of this generic approach (with appropriate vocabularies).
>>>> 
>>>> I'll put this proposal somewhere on the wiki with more technical details.
>>>> 
>>>> 
>>>> Regards,
>>>> AZ.
>>>> 
>>>> 
>>>> 
>>>> Le 08/03/2011 12:29, Richard Cyganiak a écrit :
>>>>> All,
>>>>> 
>>>>> I wrote up a proposal for addressing the [Graphs] work item:
>>>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal
>>>>> 
>>>>> The gist is to simply lift the definition of RDF Datasets from SPARQL into RDF Concepts.
>>>>> 
>>>>> I believe that this is the simplest thing we could possibly do in order to fulfill the work item from the charter, and addresses the use cases that were brought forward.
>>>>> 
>>>>> This is intended as a starting point for discussion. In particular I'd like to see:
>>>>> 
>>>>> - arguments that this doesn't address (or poorly addresses) the use cases
>>>>> - arguments that this doesn't meet the charter requirements
>>>>> - improvements to the proposal that would help to better address the use cases
>>>>> - counter-proposals in a similar style
>>>>> 
>>>>> Best,
>>>>> Richard
>>>> 
>>>> 
>>>> -- 
>>>> Antoine Zimmermann
>>>> Researcher at:
>>>> Laboratoire d'InfoRmatique en Image et Systèmes d'information
>>>> Database Group
>>>> 7 Avenue Jean Capelle
>>>> 69621 Villeurbanne Cedex
>>>> France
>>>> Lecturer at:
>>>> Institut National des Sciences Appliquées de Lyon
>>>> 20 Avenue Albert Einstein
>>>> 69621 Villeurbanne Cedex
>>>> France
>>>> antoine.zimmermann@insa-lyon.fr
>>>> http://zimmer.aprilfoolsreview.com/
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
>
Received on Tuesday, 8 March 2011 19:58:40 UTC