Use case wrt Dataset proposal (UC 1.3, 1st part)

*Beware*: this is about design solutions using the dataset proposal as a 
whole. It is not strictly related to the semantics. It explains 
concretely how one could store things in a dataset, possibly entail new 
things according the dataset semantics of [2] and so on, such that 
eventually it addresses the use case. So it contains a number of things 
that applications should do to address the UCs, independently of the 
truth values of triples or "named" graphs.


UC 1.3: Graph Changes Over Time


This use case is much more complicated to deal with than UC 1.5 and 5.2.

First, there are different notions of temporal change:
  (1) it can refer to changes of conceptualisation, i.e., the graph 
changed because we realised it wasn't modelling reality accurately. This 
is related to version;
  (2) changes because the truth of statements changed (e.g., Joe works 
for XYZ in 2004, but he quits in 2005 and now works for ABC). This is 
validity time.

Then, there is the problem of what kind of temporal information are 
represented. Temporal information can be as simple as a xsd:dateTime 
value. But it can be a time interval. It could be a recurring time 
interval. It could be an arbitrary set (possibly infinite) of time 
points. It could be a variable constrained by temporal relationships 
like Allen's algebra. So I'll start with the simpler cases.

I'll give a design solution for RDF graphs that are valid at a single 
time point (they can be valid at other time points---open world 
assumtion---but the design specify validy only at a finite set of time 
points). Let us assume that we have a company's data, with people 
employment. The dataset stores information like ":joe :worksfor 
:company" or ":joe :leads :teamA" etc. These fact are evolving in time.

Whenever a triple is obsolete (e.g., Joe is not a team leader any 
longer), create a new "named" graph where the new information is 
provided. The question is "what name should be used for the graph?"
There are different solutions:
  1) mint a new IRI, distinct from all IRIs appearing inside the graphs 
at each time point;
  2) use a literal of type xsd:dateTime or xsd:dateTimeStamp.

Solution (2) is simpler but not in agreement with the definition of 
SPARQL datasets, which imposes that graph "names" are IRIs. However, it 
is clear and unambiguous what the graph "names" denote. Then, whatever 
is true at time t1 does not influence what is true at time t2 since the 
truth of the statement may have changed.

Solution (1) requires that additional information is given, as the graph 
IRI is supposed to be opaque. Moreover, the semantics does not allow 
anyone to assume that the IRI is denoting anything in particular.
So the idea would be to add some meta information about the dataset, 
which makes it clear how to understand it. However, even in absence of 
the metainformation, the inferences provided by the semantics of [2] are 
inline with what to expect in a temporally scoped representation: 
anything true in graph labelled by X does not need to be true in graph 
labelled by Y.
In absence of metainformation, a system that parse and reason with the 
dataset would not understand that statements are tied to a time point, 
but they at least would not allow inferences of one graph to influence 
knowledge in a different graph.

Metainformation could be provided as a separate file (together with voiD 
annotations). We would need a vocabulary to say that the dataset is 
built according to a certain IRI scheme, where each graph "name" denote 
the graph itself and is tied to a certain time point.

Something like:

<>   a void:Dataset ;
      ex:semantics  ex:GraphNamesDenoteGraph .
:g1  a  rdf:Graph ;
      ex:validAt  "2011-10-08T10:23:42"^^xsd:dateTime .
:g2  a  rdf:Graph ;
      ex:validAt "...."^^xsd:dateTime .

and the dataset itself contains:

:g1 { :bob :worksfor :company1 }
:g2 { :bob :worksfor :company2 }
...

When a dataset processor meets the statement:

  <>  ex:semantics  ex:GraphNamesDenoteGraph .

it would know that the following statements are meant to say something 
about the graphs themselves, which can be stating as additional semantic 
constraints.

This may be sufficient when one simply want to query what is true at a 
given time point, or just to have a kind of wayback machine for RDF. But 
it's certainly not satisfying for a lot of use cases.

More in future emails.

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/

Received on Wednesday, 29 February 2012 15:21:45 UTC