Re: [Graphs] Proposal: RDF Datasets from Ivan Herman on 2011-03-08 (public-rdf-wg@w3.org from March 2011)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 8 Mar 2011 19:01:38 +0100
To: antoine.zimmermann@insa-lyon.fr
Cc: public-rdf-wg@w3.org
Message-Id: <7B966893-6995-404D-AD25-8A51EA8BA908@w3.org>
On Mar 8, 2011, at 18:38 , Antoine Zimmermann wrote:

> Le 08/03/2011 18:23, Ivan Herman a écrit :
>> 
>> On Mar 8, 2011, at 18:15 , Antoine Zimmermann wrote:
>> 
>>> Le 08/03/2011 17:09, Ivan Herman a écrit :
>>>> 
>>>> [...] You and Antonie are arguing on the semantics of _datasets_; I am still not convinced that this discussion should happen in the first place!
>>> 
>>> 
>>> Well, we are arguing about the semantics of multigraph documents.
>> 
>> I do not know what that means.
> 
> Something which contains the definition of several g-snaps, or g-boxes, such as an N-Quads document, a TriG document or whatever which, abstractly, maps to a finite set of g-snaps or g-boxes, or named g-snaps or named g-boxes, depending how we interpret it. That is, we are discussing how should the examples given in TF-Graphs-UC should be interpreted.
> Richard, as a starting point, wrote that these documents are, abstractly, mapped to datasets. I chose to endorse that starting point and went on with the discussion. This can be argued against as well.
> 

I expected this answer, of course. But... as far as I am concerned a TriG file

<G> { blabla }
<H> { lalala }

is the same as two different TriG files, one saying

<G> { blabla }

and _another one_ that says

<H> { lalala }

Actually, there were some references in the original named graph paper that suggested that if a turtle file containing blabla is at the uri <G>, this could also be considered to be a named graph with a URI with the content blabla. Ie, I could have to Turtle files, one at URI <G> and the other in <H>, with the content of "blabla" and the "lalala", respectively.

What I am getting at is that a TriG document that has several named graphs (or named whatever) in it is nothing special. It is a collection of named graphs that do not really have anything connection among one another, they just happen to be in the same file for a reason of convenience. In other words, I do not think there is a real 'semantic' notion of a 'multigraph document' and neither is there for the concept of a 'dataset' as far as the core RDF documents are concerned... To be more precise: I am still not convinced...

Ivan

> 
> AZ.
> 
>> 
>> Ivan
>> 
>> 
>>> Defining the semantics of a single named graph or g-box (as it is done in the Named Graphs paper) does not tell how to interpret a multigraph document.
>>> 
>>> If there is a serialisation format to exchange multigraph documents, then there must be some way to interpret them.
>>> 
>>> There are different interpretations of multigraph documents that have been implemented, but we could define a "minimal" requirement for interpreting them, which implementations extend. Having a minimal formal semantics could also be useful to formalise the actual semantics of implemented system. That could look like: "we follow the standard semantics, to which we add the following constraints: ...".
>>> 
>>> 
>>> 
>>> Regards,
>>> AZ.
>>> 
>>>> just my two cents...
>>>> 
>>>> Ivan
>>>> 
>>>> 
>>>> 
>>>>> Best,
>>>>> Richard
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Ivan
>>>>>> 
>>>>>> On Mar 8, 2011, at 15:17 , Antoine Zimmermann wrote:
>>>>>> 
>>>>>>> Richard,
>>>>>>> 
>>>>>>> 
>>>>>>> Good starting point.
>>>>>>> 
>>>>>>> I am in favour of using the notion of dataset from SPARQL but I have a problem with the semantics. You say:
>>>>>>> 
>>>>>>> "The interpretation of an RDF Dataset is that of the union of its constituent graphs."
>>>>>>> 
>>>>>>> One of the strong reasons to keep information about provenance is to avoid spreading inconsistencies everywhere. Separating statements in distinct boxes should avoid knowledge from disjoint contexts to intertwine.
>>>>>>> 
>>>>>>> Besides, in a semantic web search engine which index all RDF data on the web (like Sindice, SWSE) this is not acceptable. Neither Sindice nor SWSE implement the semantics you propose, which is unfortunate since you advocate following deployed application practices and those are among high-profile applications from your own institute.
>>>>>>> 
>>>>>>> What you define is a semantics which maximises the "permeability" of contexts, that is, every triples defined in any graphs influence equally the knowledge from any other graph within a dataset.
>>>>>>> 
>>>>>>> On the contrary, we could argue in favour of a semantics that minimises the permeability of contexts, that is, a triple in a graph can only have an impact on the knowledge of that graph.
>>>>>>> 
>>>>>>> This can be formalised as follows:
>>>>>>> 
>>>>>>> "The interpretation of an RDF Dataset (G, (id1,G1), ..., (idn,Gn)) is a tuple (I, I1, ..., In) where I is an RDF-interpretation of G and for all 1<= i<= n, Ii is an RDF-interpretation of Gi."
>>>>>>> 
>>>>>>> This way, you prevent the knowledge of a graph from perturbing the knowledge of other graphs, thereby complying very well with heterogeneous and unreliable information from all over the Web.
>>>>>>> 
>>>>>>> Unfortunately, this is not ideal because it is often desired that knowledge actually "flows" across contexts. There are several proposed formalisms that lie in between the two extremes defined above (viz., maximal and minimal permeability) but this is not the goal of this WG to choose or define one. However, it would be good if the semantics of datasets was as generic and permissive as possible, such that extensions of it can constrain it further (just like the semantics of RDF is itself very permissive but further constrained by RDFS, OWL, SWRL, etc). In this sense, the "minimal permeability" semantics is the most permissive. To constrain it, it suffices to add vocabularies that specify the way knowledge from graphs interact. For instance:
>>>>>>> 
>>>>>>> :G1 ex:imports :G2 .
>>>>>>> 
>>>>>>> could be a way to ensure that the interpretation of :G1 has to satisfy both :G1 and :G2. If all graphs import each others, then an interpretation of a dataset becomes equivalent to an RDF-interpretation of the union of its constituent graphs, which is exactly the "maximal permeability" semantics that you defined.
>>>>>>> 
>>>>>>> The reasoning formalisms used by Sindice or SWSE (and certainly other triple stores with reasoning capabilities) would fit well with this approach. Annotated RDF(S) would also work as a semantic extension of this generic approach (with appropriate vocabularies).
>>>>>>> 
>>>>>>> I'll put this proposal somewhere on the wiki with more technical details.
>>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> AZ.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Le 08/03/2011 12:29, Richard Cyganiak a écrit :
>>>>>>>> All,
>>>>>>>> 
>>>>>>>> I wrote up a proposal for addressing the [Graphs] work item:
>>>>>>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal
>>>>>>>> 
>>>>>>>> The gist is to simply lift the definition of RDF Datasets from SPARQL into RDF Concepts.
>>>>>>>> 
>>>>>>>> I believe that this is the simplest thing we could possibly do in order to fulfill the work item from the charter, and addresses the use cases that were brought forward.
>>>>>>>> 
>>>>>>>> This is intended as a starting point for discussion. In particular I'd like to see:
>>>>>>>> 
>>>>>>>> - arguments that this doesn't address (or poorly addresses) the use cases
>>>>>>>> - arguments that this doesn't meet the charter requirements
>>>>>>>> - improvements to the proposal that would help to better address the use cases
>>>>>>>> - counter-proposals in a similar style
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Richard
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Antoine Zimmermann
>>>>>>> Researcher at:
>>>>>>> Laboratoire d'InfoRmatique en Image et Systèmes d'information
>>>>>>> Database Group
>>>>>>> 7 Avenue Jean Capelle
>>>>>>> 69621 Villeurbanne Cedex
>>>>>>> France
>>>>>>> Lecturer at:
>>>>>>> Institut National des Sciences Appliquées de Lyon
>>>>>>> 20 Avenue Albert Einstein
>>>>>>> 69621 Villeurbanne Cedex
>>>>>>> France
>>>>>>> antoine.zimmermann@insa-lyon.fr
>>>>>>> http://zimmer.aprilfoolsreview.com/
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----
>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>> mobile: +31-641044153
>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Antoine Zimmermann
>>> Researcher at:
>>> Laboratoire d'InfoRmatique en Image et Systèmes d'information
>>> Database Group
>>> 7 Avenue Jean Capelle
>>> 69621 Villeurbanne Cedex
>>> France
>>> Lecturer at:
>>> Institut National des Sciences Appliquées de Lyon
>>> 20 Avenue Albert Einstein
>>> 69621 Villeurbanne Cedex
>>> France
>>> antoine.zimmermann@insa-lyon.fr
>>> http://zimmer.aprilfoolsreview.com/
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Antoine Zimmermann
> Researcher at:
> Laboratoire d'InfoRmatique en Image et Systèmes d'information
> Database Group
> 7 Avenue Jean Capelle
> 69621 Villeurbanne Cedex
> France
> Lecturer at:
> Institut National des Sciences Appliquées de Lyon
> 20 Avenue Albert Einstein
> 69621 Villeurbanne Cedex
> France
> antoine.zimmermann@insa-lyon.fr
> http://zimmer.aprilfoolsreview.com/
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Tuesday, 8 March 2011 17:59:54 UTC