W3C home > Mailing lists > Public > www-rdf-interest@w3.org > September 2003

Representing temporal data in RDF

From: Bob MacGregor <macgregor@ISI.EDU>
Date: Tue, 09 Sep 2003 10:41:55 -0700
Message-Id: <5.1.1.6.0.20030909103557.01a939f8@tnt.isi.edu>
To: www-rdf-interest@w3.org


I have yet to come across a system having a substantial user base that
does an adequate job of representing temporally situated data.  If you
think about it, most facts about the world are either true only within
a specific temporal interval, or they are things like events that have
a time component built in (mathematical facts are an exception).  In
other words, when it comes to representing day-to-day facts, there is
a large opportunity waiting.  Ideally, the Semantic Web tools could
fill this void.  Unfortunately, the tools that RDF provides us are
almost unbearably clumsy.  Below is an example.


Here is my example query:

"Retrieve freighters that visited Antwerp on
April 2003 whose cargo included aluminum pipes"


The best solution that I have come across for representing
this query uses quads.  A "quad" is a 4-tuple <?c ?s ?p ?o>
with roles context, subject, predicate, object, i.e., its a
triple with an extra context field.

Here is the query in a RDQL variant that supports a quad
syntax instead of a triple syntax, with namespaces omitted:

SELECT ?f
WHERE  ((null ?f type Freighter),
	(?c ?f location antwerp),
	(?c ?f hasCargo ?cargo),
	(?c ?cargo consistsOf AluminumPipe),
	(null ?c beginDate ?begin),
	(null ?c endDate ?end),
	(?begin before "May 1 2003"),
	(?end after "March 31 2003"))

This is actually quite a reasonable query.  Its fairly concise,
and fairly readable.  Unfortunately, I'm not aware of any system
that implements quads and has a significant user base.  For the
moment, quads are still a wish that has not come true.

Now, lets consider expressing this same query using only triples.  RDF
does not provide any officially-sanctioned way to do this, so we have
to improvise.  RDF provides the notion of a reified statement, but
there is more than one way to use reified statements.

One approach attaches dates and other metadata directly to reified
statements.  Anyone who has experimented with this long enough will
realize that this approach is a loser.

A second approach attaches metadata (like dates) to a "collection of
statements", where the collection might be an RDF bag or list.  This
is a big improvement over the previous approach, but we can do better.

The simplest approach invents a context object, and points a context
to the reified statements within it (or points the statements to the
context).  This approach is isomorphic to the second approach, but is
slightly cleaner and probably more efficient.

Here we have rewritten the first query using only triples, and
contexts that include reified statements:

SELECT ?f
WHERE  ((?f type Freighter),
	(?st1 type Statement),
	(?st1 subject ?f),
	(?st1 predicate location),
	(?st1 object antwerp),
	(?st2 type Statement),
	(?st2 subject ?f),
	(?st2 predicate hasCargo),
	(?st2 object ?cargo),
	(?st3 type Statement),
	(?st3 subject ?cargo),
	(?st3 predicate consistsOf),
	(?st3 object AluminumPipe),
	(?st1 inContext ?c),
	(?st2 inContext ?c),
	(?st3 inContext ?c),
	(?c beginDate ?begin),
	(?c endDate ?end),
	(?begin before "May 1 2003"),
	(?end after "March 31 2003"))


This gets the job done, but its really quite awful.  Not only is it
much less readable, but it is MUCH less efficient than the quad
representation.  Why is that?  First of all, the number of "joins" is
much larger.  Second, and probably more damaging, the optimizer now
has to optimize over predicates like "subject", "predicate" and
"object" that mix together extensions of many different predicates.  A
database consisting mostly of temporally-situated data will have to
reify the majority of its data, so these three predicates will likely
contain most of the data in the database. Not a pretty sight.

In fact, it IS possible to stick to triples instead of quads, and
still produce a practical means for representing temporally situated
data.  I'm looking for examples of other RDF-based systems that have
successfully solved this problem.  Any takers (in addition to a reference,
I would like to see how my query would be represented in your system)?

Cheers, Bob
Received on Tuesday, 9 September 2003 13:42:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:52:02 GMT