W3C home > Mailing lists > Public > public-rdf-wg@w3.org > December 2011

Re: dataset semantics

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 19 Dec 2011 19:50:20 +0000
Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, public-rdf-wg@w3.org
Message-Id: <204CF2A9-238D-400F-B7BB-543C2F5E2B74@cyganiak.de>
To: Pat Hayes <phayes@ihmc.us>
On 19 Dec 2011, at 10:48, Pat Hayes wrote:
> I would like to see some evidence, from actual use cases, of how it can be that different RDF graphs hold in different contexts,

See here for some (toy) examples:

A list of use cases provided by WG members is on the wiki:

Many of the use cases ask for context information about RDF graphs in order to decide wether to accept a graph as true. For example:

• Graph Changes Over Time
• Versioning in SDMX and DDI
• Contextual constraints in queries
• Wikidata
• Web crawling
• Trust Web Opinions
• Reasoning over annotations

> and some clarification of what is meant by a "context" here.

This differs a lot by use case.

> Is linked data context-relative? If so, what determines the contexts in the extant RDF triples which comprise the linked data cloud?

As a simplified approximation, the context is a function of the URI under which an RDF graph is published.

> How can information from different contexts be used together?

It generally cannot, unless you have some extra information (e.g., provenance metadata) that establishes sufficient confidence in the information for the purpose of the application. Some application may be ok with merging absolutely anything. Others may only rely on information from a fixed set of providers (e.g., from URIs with a certain hostname). Many other approaches are possible.

See here for Sandro's writeup of some of these issues:


> Pat
> On Dec 19, 2011, at 4:06 AM, Antoine Zimmermann wrote:
>> Just wanted to reiterate, there is a dataset semantics at [1] which has
>> been there since about March 2011. In spite of the math symbols all over
>> the place, it's really simple. The rationale was to make it according to the least common denominator, such that it does not put constraints that some people would like to relax later on. Adding constraints can be done easily on a conformant implementation, while removing constraints make the implementation non-compliant.
>> Note that this semantics does not change the semantics of RDF, as it is separated from it, though relying on it.
>> [1] TF-Graphs/RDF-Datasets-Proposal, Section "Semantics". http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics.
>> Le 17/12/2011 06:43, Sandro Hawke a écrit :
>>> On Fri, 2011-12-16 at 22:47 -0600, Pat Hayes wrote:
>>>> On Dec 16, 2011, at 10:21 PM, Sandro Hawke wrote:
>>>>> ... maybe I can figure out some TriG entailment tests....
>>>>> Like, does this TriG document / dataset:
>>>>> {<a>  <b>  <c>  }
>>>>> entail this RDF graph:
>>>>> <a>  <b>  <c>.
>>>>> I think it should, so we can have metadata in TriG, but other
>>>>> people have disagreed.   How should we be gather test cases like
>>>>> this?
>>>> FWIW, 'entailment' has a fairly precise meaning. A entails B when B
>>>> is true whenever A is, or more precisely if, for every possible
>>>> interpretation I, if A is true in I then B is true in I. So it only
>>>> makes sense to speak of entailment when there is some notion of
>>>> truth-in-an-interpretation to base it on.
>>> Yes, I know.
>>>> So, what are the truth conditions for datasets?
>>> We haven't quite figured that out yet.   I'm proposing one part of
>>> that is that a dataset being true implies its default graph is true.
>>> The other part of the truth conditions has to do with the
>>> relationship between the things named by the label URIs and the
>>> graphs they label.
>>> Unfortunately, I think we need to allow for several possible
>>> relationships there, MAYBE even in the same dataset, which makes
>>> things rather complicated.
>>> One example of the relationship is what I called graphState in a
>>> different thread.  In that case, the dataset being true would imply
>>> that for each<U,G>  in the dataset, the state of the resource U is
>>> the graph G.   (Here, I mean "state" and "resource" in exactly the
>>> REST sense.)
>>> Another example is an out of date version of graphState, maybe call
>>> it graphStateWas.  In this case, the dataset being true would imply
>>> that for each<U,G>  in the dataset, the state of the resource U is,
>>> or used to be, graph G.
>>> Another example of the relationship is something I gather Cambridge
>>> Semantics uses, which I'll call subjectOf.   (In one of their
>>> deployment modes, triples are divided into two type, which I'll call
>>> A and B, based on which predicate they use.  The dataset is
>>> constructed such that for each<U, G>  in the dataset, every type-A
>>> triple in G is of the form {<U>  ?P ?O }.  The type-B triples are a
>>> little more complicated.)  In this case, the dataset being true would
>>> imply the dataset being segmented in this complicated but useful
>>> way.
>>> It's *rather* tempting to just use triples for this, making
>>> graphState, graphStateWas, subjectOf, etc, be predicates.   That way
>>> the semantics of datasets would be much simpler, with the
>>> complications bundled into the semantics of those particular
>>> predicates.
>>> I'm guess I'm suggesting extending the definition of dataset to be a
>>> default graph and rather than a set of pairs<U,G>, be a set of
>>> triples <U, R, G>, where R is optional.  If R is omitted, you have
>>> the kind of dataset we're used to now, where we have no idea what
>>> that relation is supposed to be (unless the author tells us humans).
>>>> Can one assert a dataset (ie claim it to be true)?
>>> Yes.
>>>> How does one do that?
>>> The same way you do with RDF.  It kind of depends on your
>>> application. Maybe you publish it on the web; maybe you send it to
>>> some agent; maybe you publish it and send the URL somewhere, etc.
>>> -- Sandro
>> -- 
>> Antoine Zimmermann
>> ISCOD / LSTI - Institut Henri Fayol
>> École Nationale Supérieure des Mines de Saint-Étienne
>> 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2
>> France
>> Tél:+33(0)4 77 42 83 36
>> Fax:+33(0)4 77 42 66 66
>> http://zimmer.aprilfoolsreview.com/
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 19 December 2011 19:50:56 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:02:02 UTC