Datasets and contextual/temporal semantics

(I wrote this early today in the hotel – don't have wifi there and didn't have a chance to read Pat's long message yet – will be interesting to compare them!)


The big problem we face with the semantics of RDF datasets is this: People want to use RDF datasets to manage information with *different context*, such as different temporal validity. RDF Semantics is not designed to to handle different contexts. Many of our problems stem from that.

I'll give examples.

   :G2010 {:alice :age 29.}
   :G2011 {:alice :age 30.}

Individually, each of those graphs are true (at a certain point in time). If taken together, an inconsistency is inferred (assuming :age is a functional property):

   :alice :age 29, 30.

By merging the two graphs, we have discarded the contextual information. This shows that the graph merge operation is *not truth-preserving* – not *valid* in the formal sense – *if* the merged graphs have different contexts.

Another example:

   :G2010 {
     <person/2279> :worksFor <person/2279/employer>.
     <person/2279/employer> owl:sameAs companies:431.
   }
   :G2011 {
     <person/2279> :worksFor <person/2279/employer>.
     <person/2279/employer> owl:sameAs companies:998.
   }

Taking each graph individually, we have no reason to assume that they aren't true (at their respective times), and the modelling is perfectly sensible. But the person changed employers, so evidently <person/2279/employer> identifies two different resources in the two graphs (assuming companies 431 owl:differentFrom companies:998). Their merge would obviously be inconsistent.

These examples were temporal, but there are non-temporal examples too:

   :G1 {
     <urn:uuid:123456789> a :Person;
       :mbox <mailto:bob@example.com>;
       :birthday "1984-02-05".
   }
   :G2 {
     <urn:uuid:123456789> a :Person;
       :mbox <mailto:alice@example.com>;
       :birthday "1981-10-21".
   }

This is a clash of identifiers, caused maybe by poor chance, or by erroneous copy-and-pasting or poor identifier management. Nevertheless, each graph on its own is reasonable and has to be considered true. No party has more right over the authority-less URN than the other. In the one context, the URN simply *does* denote Alice, and in the other context, it simply *does* denote Bob. But the merge of these two true graphs of different context is clearly false, as the contexts are incompatible.

So, RDF Semantics does not work if we mix triples that have different temporal validity or other contextual differences.

That's not really much of a problem, because it just means that you have to keep triples of different context apart in separate graphs. This is intuitive enough, and people seem to have no problem understanding that (except Pat, ironically!) There is an unspoken yet intuitive assumption that the triples in one graph share the same context. As long as contexts are kept apart, the entailments of RDF Semantics work and are useful.

Now, in the real world, applications *have* to deal with data of different contexts. Different versions, different provenance, contradictory viewpoints, honest errors in identifier use, and so on. Many practitioners working with RDF already take for granted a model that takes context into account, even though the formal semantics doesn't take it into account.

That's why named graphs were invented in the first place – to make it possible to keep contexts apart, so that statements which are true in their respective context don't get all smushed together into one inseparable inconsistent mess. Named graphs exist to *escape* the entailments of RDF Semantics in situations where they are inappropriate. Named graphs exist to enable the storage and processing of contradictory and incompatible information with RDF tools.

So here are a couple of axioms that I believe are true and that one has to understand in order to apply RDF Semantics correctly, but that are unfortunately unstated in RDF Semantics:

1. RDF Semantics defines an entailment relationship between sets of triples, a.k.a. RDF graphs
2. This entailment relationship is only valid if all triples share the same context
3. Therefore, placing triples with incompatible context into a single graph is not seen as as something useful, and we understand RDF graphs as only containing triples of compatible context
4. It follows that merging two graphs with incompatible contexts is not a valid operation
5. Whether two contexts are compatible or not is outside of the scope of RDF Semantics

And now, the key additions for RDF 1.1:

6. RDF datasets provide a way of managing context by keeping triples with different context in different named graphs

I think it is absolutely essential to keep these points in mind when we're talking about the semantics of RDF datasets.

If we can't extend RDF Semantics into a proper temporal and contextual logic, then we have make the axioms above explicit, and ensure that the semantics allow RDF datasets to hold incompatible information without entailing inconsistencies.

One consequence of this is that for any entailments that arise from the structure of RDF datasets, we must be clear about the context in which they arise. For example, if we want a graph name to denote a graph container, then we have to answer the question: *in which context* should it denote the container?

Best,
Richard

Received on Thursday, 13 October 2011 11:11:07 UTC