[ACTION-149] Dataset semantics VS Graph use cases

Warning: this email is very long.
What's written here is describing how I would address the UCs, if only 
the semantics of [1] was standardised. Several UCs require the semantics 
be extended (that is, adding semantic constraints on the notion of 
interpretation given in [1]). When exchanging a dataset between 
applications, it must be known what semantic extension is used. To do 
that, I rely on "meta-statements" that should be put in the dataset file 
(or accompanying the file). There are many ways that these 
meta-statements can be provided (like in a separate voiD description).

Note that a formal semantics does not *do* anything. So UCs always 
require something to be done that the formal semantics does not mandate. 
A reasoner is not magically understanding what people want, no matter 
what statements are added to the data.

1st case: Simple report of different beliefs

:h2g2 { ex:truth  owl:sameAs  42 .}
:devil { ex:truth  owl:sameAs  666 .}

Here, we have different parties asserting things that may or may not be 
accepted. They express opinions, and opinion can contradict. The fact 
that the opinion contradicts (the graph merge is inconsistent) does not 
mean that the report on the opinions is inconsistent (the dataset is not 

2nd case: we additionally want to say something about the graph 
(endorsement, etc):

This can be done by stated explicitly that the graph "labels" are meant 
to denote the graphs themselves. Some syntactic sweetness should be 
added, something like that:


:h2g2 :is :right .
:devil :is :wrong .
:h2g2 { ex:truth  owl:sameAs  42 .}
:devil { ex:truth  owl:sameAs  666 .}

Note that here, it is necessary that the IRI ex:truth is interpreted 
differently in the two graphs. The additional meta-statement 
@graph-iris-denote-graph simply helps a dataset application determine 
what convention is used. The formal semantics is still exacly the same, 
but the statement can be used to operate certain treatments in a 
different way as if the graph IRIs denote, e.g., the primary topic of 
the graph.

3rd case: crawlers and similar stuff:


<http://ex.org/doc1.rdf> { ex:truth  owl:sameAs  42 .}
<http://ex.net/doc2.ttl> { ex:truth  owl:sameAs  666 .}

Again, there is no reason to enforce IRIs to denote exactly the same 
thing when found in different sources, since documents online can be 
wrong, contain mistakes, express beliefs, etc. Many applications will 
not have the means to determine which one is correct (if only one is).
Thde meta-statement simply help an application decide what to do with 
this, but the formal semantics would not be affected.

As far as archiving or versioning crawled data, I think each crawl 
should be put in different files, and version data are kept separately, 
using specific vocabularies for dataset metadata (e.g., voiD). I don't 
think the dataset semantics has to address how these metadata are described.

4rd case: terminological axioms denote universal truth

An application may store ontologies in the default graph and expect that 
the axioms of the ontologies hold everywhere, all the time.


foaf:Person  owl:disjointWith  foaf:Organization .
foaf:Person  rdfs:subClassOf  foaf:Agent .
:g1 { ex:this  a  foaf:Person .}
:g2 { ex:this  a  foaf:Organization .}

The meta-statement induces a semantic restriction here. Formally, a 
dataset intereptation would satisfy this iff it "dataset-satisfies" the 
dataset (as defined in [1]) *and* for all graph "names" <g>, Con(<g>) 
satisfies the default graph (I use the new notation that Pat suggested).
In this case, we would infer that:

:g1 { ex:this  a  foaf:Agent .}

assuming RDFS or OWL semantics is used as a "local" semantics.

5th case: default graph must be the merge of "named" graphs (this has 
been reported as common in real implementations)


:foaf { foaf:Person  rdfs:subClassOf  foaf:Agent .}
:g2 { ex:me  a  foaf:Person .}

The meta-statement would induce an additional restriction here: that the 
"local" interpretation of the default graph has to satisfy all RDF 
graphs inside the "named" graphs. So, in this case, the dataset would 

#default graph:
:ex:me  a  foaf:Agent

By using both:


one emulates the case where all graphs are merged.

6th case: Pat's case (IRIs must denote the same thing, but relationship 
between things may evolve across time/context)


:g2010-09-11 { ex:joe  foaf:worksFor  ex:ibm .}
:g2012-01-16 { ex:joe  foaf:worksFor  ex:cisco .}

The meta-statement enforces that, for all graph IRIs <g>, <g'> and all 
IRI <u>, the interpretation of <u> in context <g> is the same as the 
interpretation of <u> in context <g'>. Formally, Con(<g>)(<u>) = 
Con(<g'>)(<u>). In this case, the following would be inconsistent:


:g2010-09-11 { ex:james  owl:sameAs  ex:jim .}
:g2012-01-16 { ex:james  owl:differentFrom ex:jim .}

7th case: temporal validity changes (in intervals)

This case is trickier, but I need to introduce it, as it better explains 
how we could do more complex reasoning with datasets (then it makes it 
either to explain how to address "separation of inferences" in [2]. For 
this case, I would address it by using literals in the fourth position, 
rather than IRIs.


ex:chadhurley  a  ex:YoutubeEmployee . "[2005,2010]"^^interval
ex:YoutubeEmployee  a  ex:GoogleEmployee . "[2006,2011]"^^interval

Here, the additional restriction on the semantics is that each literal 
in the datatype "interval" would be assigned a distinct interpretation.
Additionally, anything that is true in an interval [x,y] must be true in 
all subintervals. As a consequence, in the example above, the following 
quads would be inferred:

ex:chadhurley  a  ex:YoutubeEmployee . "[2006,2010]"^^interval
ex:YoutubeEmployee  rdfs:subClassOf  ex:GoogleEmployee . 

Since the fourth column is now identical for the two triples, the 
semantics of [1] says that all normal RDF(S) inferences hold, therefore, 
I can conclude:

ex:chadhurley  a  ex:GoogleEmployee . "[2006,2010]"^^interval

Note that, according to XSD, datatypes provide not only a lexical space, 
a value space, a L2V mapping, but they should normally provide "facets" 
which are kinds of functions on datatypes. The "literal" datatype could 
provide, as a facet, the comparison "included-in".

8th case: generalisation of 7th case

Other kinds of annotations could be used. For instance, a simplpe trust 
measure for graphs (possibly calculated from page-rank-like algorithms). 
Instead of using the "included-in" relation to define the semantic 
restriction for satisfaction, the "less-than" relation would be used. It 
can be generalised further, assuming an order on the values (and some 
other restrictions).

One particular case in this generalisation is provenance annotations:


foaf:Person  rdfs:subClassOf  foaf:Agent . "foaf:"^^prov
ex:chadhurley  a  foaf:Person . "dbpedia:"^^prov

We write provenance as a conjunction of URL, then one can infer:

ex:chadhurley  a  foaf:Agen . "foaf: \and dbpedia:"^^prov

This partly address Sandro's UC on "separating inferences".

[1] RDF Datasets Proposal. 
[2] Why Graphs. http://www.w3.org/2011/rdf-wg/wiki/Why_Graphs
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66

Received on Tuesday, 6 March 2012 17:09:52 UTC