Bob --

At last, someone who is concerned about style, readability, and plain useability !

It seems clear that RDF is going to be at best the assembly language of any useable future Semantic Web. So the question arises, what user friendly front end will we have so that we can specify and run questions ?

A related issue is that RDF reasoning is severely limited, and OWL etc are programmer-friendly rather than end-user-friendly. As argued elsewhere on the rdf lists, something based on datalog+negation-as-failure looks like a good practical trade off between reasoning capability and efficiency.

We have a system that allows one to write rules in English, and then run them as a program. At the moment, the system either reasons over flat files, or generates and runs SQL over Oracle and the like. But it could equally well generate RDQL or even the triples in your example below.

I hope this may be of interest. The system is on the Web at the URL below, and non-commercial writing and running of rules is free. There's an example called PhoneBilling1 that does a spot of temporal reasoning about what plan a customer is on, and another called SemanticWebOntology1 that traverses a hierarchy of relations.

If it sounds useful, I can put up on the Web a generalized, running version of "Retrieve freighters that visited Antwerp on April 2003 whose cargo included aluminum pipes" using (a) triples and (b) something that is compileable down to triples. Let me know.

Thanks in advance, -- Adrian

                           INTERNET BUSINESS LOGIC

Business Rule Applications in English, Using Your Oracle Database

                            www.reengineeringllc.com

Adrian Walker
Reengineering LLC
PO Box 1412
Bristol
CT 06011-1412 USA

Phone: USA 860 583 9677
Cell:    USA 860 830 2085
Fax:    USA 860 314 1029

Bob MacGregor wrote:

I have yet to come across a system having a substantial user base that
does an adequate job of representing temporally situated data. If you
think about it, most facts about the world are either true only within
a specific temporal interval, or they are things like events that have
a time component built in (mathematical facts are an exception). In
other words, when it comes to representing day-to-day facts, there is
a large opportunity waiting. Ideally, the Semantic Web tools could
fill this void. Unfortunately, the tools that RDF provides us are
almost unbearably clumsy. Below is an example.

Here is my example query:
"Retrieve freighters that visited Antwerp on
April 2003 whose cargo included aluminum pipes"

The best solution that I have come across for representing
this query uses quads. A "quad" is a 4-tuple <?c ?s ?p ?o>
with roles context, subject, predicate, object, i.e., its a
triple with an extra context field.
Here is the query in a RDQL variant that supports a quad
syntax instead of a triple syntax, with namespaces omitted:
SELECT ?f
WHERE ((null ?f type Freighter),
(?c ?f location antwerp),
(?c ?f hasCargo ?cargo),
(?c ?cargo consistsOf AluminumPipe),
(null ?c beginDate ?begin),
(null ?c endDate ?end),
(?begin before "May 1 2003"),
(?end after "March 31 2003"))
This is actually quite a reasonable query. Its fairly concise,
and fairly readable. Unfortunately, I'm not aware of any system
that implements quads and has a significant user base. For the
moment, quads are still a wish that has not come true.
Now, lets consider expressing this same query using only triples. RDF
does not provide any officially-sanctioned way to do this, so we have
to improvise. RDF provides the notion of a reified statement, but
there is more than one way to use reified statements.
One approach attaches dates and other metadata directly to reified
statements. Anyone who has experimented with this long enough will
realize that this approach is a loser.
A second approach attaches metadata (like dates) to a "collection of
statements", where the collection might be an RDF bag or list. This
is a big improvement over the previous approach, but we can do better.
The simplest approach invents a context object, and points a context
to the reified statements within it (or points the statements to the
context). This approach is isomorphic to the second approach, but is
slightly cleaner and probably more efficient.
Here we have rewritten the first query using only triples, and
contexts that include reified statements:
SELECT ?f
WHERE ((?f type Freighter),
(?st1 type Statement),
(?st1 subject ?f),
(?st1 predicate location),
(?st1 object antwerp),
(?st2 type Statement),
(?st2 subject ?f),
(?st2 predicate hasCargo),
(?st2 object ?cargo),
(?st3 type Statement),
(?st3 subject ?cargo),
(?st3 predicate consistsOf),
(?st3 object AluminumPipe),
(?st1 inContext ?c),
(?st2 inContext ?c),
(?st3 inContext ?c),
(?c beginDate ?begin),
(?c endDate ?end),
(?begin before "May 1 2003"),
(?end after "March 31 2003"))

This gets the job done, but its really quite awful. Not only is it
much less readable, but it is MUCH less efficient than the quad
representation. Why is that? First of all, the number of "joins" is
much larger. Second, and probably more damaging, the optimizer now
has to optimize over predicates like "subject", "predicate" and
"object" that mix together extensions of many different predicates. A
database consisting mostly of temporally-situated data will have to
reify the majority of its data, so these three predicates will likely
contain most of the data in the database. Not a pretty sight.
In fact, it IS possible to stick to triples instead of quads, and
still produce a practical means for representing temporally situated
data. I'm looking for examples of other RDF-based systems that have
successfully solved this problem. Any takers (in addition to a reference,
I would like to see how my query would be represented in your system)?
Cheers, Bob