- From: Bob MacGregor <macgregor@ISI.EDU>
- Date: Tue, 09 Sep 2003 10:41:55 -0700
- To: www-rdf-interest@w3.org
I have yet to come across a system having a substantial user base that does an adequate job of representing temporally situated data. If you think about it, most facts about the world are either true only within a specific temporal interval, or they are things like events that have a time component built in (mathematical facts are an exception). In other words, when it comes to representing day-to-day facts, there is a large opportunity waiting. Ideally, the Semantic Web tools could fill this void. Unfortunately, the tools that RDF provides us are almost unbearably clumsy. Below is an example. Here is my example query: "Retrieve freighters that visited Antwerp on April 2003 whose cargo included aluminum pipes" The best solution that I have come across for representing this query uses quads. A "quad" is a 4-tuple <?c ?s ?p ?o> with roles context, subject, predicate, object, i.e., its a triple with an extra context field. Here is the query in a RDQL variant that supports a quad syntax instead of a triple syntax, with namespaces omitted: SELECT ?f WHERE ((null ?f type Freighter), (?c ?f location antwerp), (?c ?f hasCargo ?cargo), (?c ?cargo consistsOf AluminumPipe), (null ?c beginDate ?begin), (null ?c endDate ?end), (?begin before "May 1 2003"), (?end after "March 31 2003")) This is actually quite a reasonable query. Its fairly concise, and fairly readable. Unfortunately, I'm not aware of any system that implements quads and has a significant user base. For the moment, quads are still a wish that has not come true. Now, lets consider expressing this same query using only triples. RDF does not provide any officially-sanctioned way to do this, so we have to improvise. RDF provides the notion of a reified statement, but there is more than one way to use reified statements. One approach attaches dates and other metadata directly to reified statements. Anyone who has experimented with this long enough will realize that this approach is a loser. A second approach attaches metadata (like dates) to a "collection of statements", where the collection might be an RDF bag or list. This is a big improvement over the previous approach, but we can do better. The simplest approach invents a context object, and points a context to the reified statements within it (or points the statements to the context). This approach is isomorphic to the second approach, but is slightly cleaner and probably more efficient. Here we have rewritten the first query using only triples, and contexts that include reified statements: SELECT ?f WHERE ((?f type Freighter), (?st1 type Statement), (?st1 subject ?f), (?st1 predicate location), (?st1 object antwerp), (?st2 type Statement), (?st2 subject ?f), (?st2 predicate hasCargo), (?st2 object ?cargo), (?st3 type Statement), (?st3 subject ?cargo), (?st3 predicate consistsOf), (?st3 object AluminumPipe), (?st1 inContext ?c), (?st2 inContext ?c), (?st3 inContext ?c), (?c beginDate ?begin), (?c endDate ?end), (?begin before "May 1 2003"), (?end after "March 31 2003")) This gets the job done, but its really quite awful. Not only is it much less readable, but it is MUCH less efficient than the quad representation. Why is that? First of all, the number of "joins" is much larger. Second, and probably more damaging, the optimizer now has to optimize over predicates like "subject", "predicate" and "object" that mix together extensions of many different predicates. A database consisting mostly of temporally-situated data will have to reify the majority of its data, so these three predicates will likely contain most of the data in the database. Not a pretty sight. In fact, it IS possible to stick to triples instead of quads, and still produce a practical means for representing temporally situated data. I'm looking for examples of other RDF-based systems that have successfully solved this problem. Any takers (in addition to a reference, I would like to see how my query would be represented in your system)? Cheers, Bob
Received on Tuesday, 9 September 2003 13:42:52 UTC