W3C home > Mailing lists > Public > public-rdf-wg@w3.org > December 2011

Re: dataset semantics

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Mon, 19 Dec 2011 12:37:11 +0100
Message-ID: <4EEF21E7.60307@emse.fr>
To: Pat Hayes <phayes@ihmc.us>
CC: public-rdf-wg@w3.org

Le 19/12/2011 11:48, Pat Hayes a écrit :
> Antoine, this semantics seems to me to be broken. It refers to "the
> fact that different RDF Graphs hold in different contexts.". But is
> this a fact? The idea of RDF being context-relativized was considered
> by the original WG and resoundingly rejected.

If the previous WG rejected the idea of making RDF a contextual logic, 
then I totally agree with this. I repeat that I propose to keep the RDF 
semantics *exactly* as it is in RDF 2004.

> If indeed RDF is
> contextual in this way, then we need to reconsider not just the
> semantics of datasets, but the entire semantics of RDF.

Why? I do propose a semantics of datasets which does not change anything 
to RDF.

> I do not
> believe that it is appropriate to have RDF be a contextual logic.

Me neither.

> You
> may disagree:

I don't.

> but at the very least, we cannot simply assume this as
> though it were obvious. Before going this route, we need to at least
> discuss what the issues are that arise from such an interpretation.

As I don't want to take the route of "contextual RDF", there is not need 
for me to get into those discussions.

> I would like to see some evidence, from actual use cases, of how it
> can be that different RDF graphs hold in different contexts, and some
> clarification of what is meant by a "context" here. Is linked data
> context-relative?

This is somehow philosophical. What's important is that SPARQL 
introduced the notion of datasets and people found that so useful they 
could not live without it now. Considering the importance of that 
concept in SPARQL, we (I include here a bunch of people, with Richard 
being the most prominent) wanted to include datasets in the RDF spec. 
That's the starting point. With this notion, you have a way to 
compartiment data in different identified boxes, which is very useful 
for querying a portion of the data without getting answers poluted by 
things that only hold in a different box. But with this very simple 
notion you realise you can put in different boxes things that belong to 
different versions, or things that hold at different time frame, or 
things that come from different sites, etc. You can prevent 
inconsistencies due to disagreement among data sources, in a systematic 
way. And so on...

> If so, what determines the contexts in the extant
> RDF triples which comprise the linked data cloud?

It doesn't matter. People can compartiment their data the way they want, 
just like people can write random triples. The way the context is 
defined or the way triples are made and published is not the concern of 
the RDF specs and especially not the formal semantics.

  How can information
> from different contexts be used together?

This does not have to be answered completely by the working group. As 
long as we provide a liberal semantics, which puts very little 
constraints, application-specific implementations can extend it to make 
data from different "contexts" interact in various ways.
There is no agreement on how multi-contextual reasoning should be done, 
so I'd rather let people do what they want, as long as the core conform 
to a common, liberal semantics. For instance, Sindice is doing reasoning 
in a way which can easily be formalised as an extension of the semantics 
of datasets. What it does, if I am not mistaken, is that it uses 
owl:imports, as well as a custom notion of "implicit import", to say 
that what is true in an imported RDF graph is also true in the importing 
RDF graph.

> Pat
> On Dec 19, 2011, at 4:06 AM, Antoine Zimmermann wrote:
>> Just wanted to reiterate, there is a dataset semantics at [1] which
>> has been there since about March 2011. In spite of the math symbols
>> all over the place, it's really simple. The rationale was to make
>> it according to the least common denominator, such that it does not
>> put constraints that some people would like to relax later on.
>> Adding constraints can be done easily on a conformant
>> implementation, while removing constraints make the implementation
>> non-compliant.
>> Note that this semantics does not change the semantics of RDF, as
>> it is separated from it, though relying on it.
>> [1] TF-Graphs/RDF-Datasets-Proposal, Section "Semantics".
>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics.
Le 17/12/2011 06:43, Sandro Hawke a écrit :
>>> On Fri, 2011-12-16 at 22:47 -0600, Pat Hayes wrote:
>>>> On Dec 16, 2011, at 10:21 PM, Sandro Hawke wrote:
>>>>> ... maybe I can figure out some TriG entailment tests....
>>>>> Like, does this TriG document / dataset:
>>>>> {<a>   <b>   <c>   }
>>>>> entail this RDF graph:
>>>>> <a>   <b>   <c>.
>>>>> I think it should, so we can have metadata in TriG, but
>>>>> other people have disagreed.   How should we be gather test
>>>>> cases like this?
>>>> FWIW, 'entailment' has a fairly precise meaning. A entails B
>>>> when B is true whenever A is, or more precisely if, for every
>>>> possible interpretation I, if A is true in I then B is true in
>>>> I. So it only makes sense to speak of entailment when there is
>>>> some notion of truth-in-an-interpretation to base it on.
>>> Yes, I know.
>>>> So, what are the truth conditions for datasets?
>>> We haven't quite figured that out yet.   I'm proposing one part
>>> of that is that a dataset being true implies its default graph is
>>> true.
>>> The other part of the truth conditions has to do with the
>>> relationship between the things named by the label URIs and the
>>> graphs they label.
>>> Unfortunately, I think we need to allow for several possible
>>> relationships there, MAYBE even in the same dataset, which makes
>>> things rather complicated.
>>> One example of the relationship is what I called graphState in a
>>> different thread.  In that case, the dataset being true would
>>> imply that for each<U,G>   in the dataset, the state of the
>>> resource U is the graph G.   (Here, I mean "state" and "resource"
>>> in exactly the REST sense.)
>>> Another example is an out of date version of graphState, maybe
>>> call it graphStateWas.  In this case, the dataset being true
>>> would imply that for each<U,G>   in the dataset, the state of the
>>> resource U is, or used to be, graph G.
>>> Another example of the relationship is something I gather
>>> Cambridge Semantics uses, which I'll call subjectOf.   (In one of
>>> their deployment modes, triples are divided into two type, which
>>> I'll call A and B, based on which predicate they use.  The
>>> dataset is constructed such that for each<U, G>   in the dataset,
>>> every type-A triple in G is of the form {<U>   ?P ?O }.  The
>>> type-B triples are a little more complicated.)  In this case, the
>>> dataset being true would imply the dataset being segmented in
>>> this complicated but useful way.
>>> It's *rather* tempting to just use triples for this, making
>>> graphState, graphStateWas, subjectOf, etc, be predicates.   That
>>> way the semantics of datasets would be much simpler, with the
>>> complications bundled into the semantics of those particular
>>> predicates.
>>> I'm guess I'm suggesting extending the definition of dataset to
>>> be a default graph and rather than a set of pairs<U,G>, be a set
>>> of triples<U, R, G>, where R is optional.  If R is omitted, you
>>> have the kind of dataset we're used to now, where we have no idea
>>> what that relation is supposed to be (unless the author tells us
>>> humans).
>>>> Can one assert a dataset (ie claim it to be true)?
>>> Yes.
>>>> How does one do that?
>>> The same way you do with RDF.  It kind of depends on your
>>> application. Maybe you publish it on the web; maybe you send it
>>> to some agent; maybe you publish it and send the URL somewhere,
>>> etc.
>>> -- Sandro
>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 83 36
>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
> ------------------------------------------------------------ IHMC
> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> (850)202 4416   office Pensacola                            (850)202
> 4440   fax FL 32502                              (850)291 0667
> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66
Received on Monday, 19 December 2011 11:37:48 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:02:02 UTC