Reqirement 3.5: subgraph results from Rob Shearer on 2004-05-04 (public-rdf-dawg@w3.org from April to June 2004)

From: Rob Shearer <Rob.Shearer@networkinference.com>
Date: Tue, 4 May 2004 12:55:30 -0700
To: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
Message-ID: <CFE388CECDDB1E43AB1F60136BEB49730280A1@rome.ad.networkinference.com>

I very strongly object to this requirement. There are a number of
different reasons.

First, the suggestion that a query should return a subgraph of the
original RDF which can then be the subject of the same query (and return
the exact same thing) is a MAJOR MAJOR constraint on what the query
language can express. It eliminates any chance of using any information
other than that encoded in triples to answer the query; in fact it
codifies that any such extension is explicitly illegal. Queries along
the lines of "must these two nodes be related via either of these two
properties?" suddenly become impossible to answer when that answer is
derived via inference, or rules, or some higher-level semantic language.
The answer is simply "yes"; producing an RDF subgraph to justify that
answer is impossible: the reasons have no canonical RDF representation.
We should keep in mind that this requirement could completely dictate
many other aspects of the language.

Second, just what use case is driving this requirement? If you really
think this requirement is crucial, then feel free to contrive some
examples. The explicit use case codified in the requirement--that of
performing exactly the same query against its own result--is obviously
useless and relevent to no sane user.

Third, it seems pretty clear that similar functionality to what is
described here is easily built by client code, and to the exact level of
detail the user truly desires. When a user performs a query, they know
what kinds of triples they consider relevent, and provided some kind of
node-binding result format is provided they can put together the
explicit triples which model the data they were after. Why are we
bloating the core query language with a feature that will be used
seldom, and almost certainly won't offer the flexibility users need when
they really do use it?

Finally, the whole notion of returning RDF subgraphs just strikes me as
philosophically wrong. RDF navel-gazing is a fascinating academic
pursuit, but real users are NOT interesting in just screwing around with
RDF. They want real answers to real questions. Like it or not, XSLT is
at best a peripheral XML technology, and it's very clear that it is not
a viable general-purpose "query language". The most widely deployed XML
query language is: SAX. Or maybe DOM. Then comes XPath. None of these
things generates XML--they traverse the XML bits to get at the core data
underneath. If the only thing you could ever do with XML was turn it
into another bit of XML then the technology would be completely useless.
If we do nothing but change RDF into more RDF then we haven't addressed
the core problem which prevents RDF from being used in practice. We
still haven't provided a standard way to actually get data *out* of it.

Received on Tuesday, 4 May 2004 15:56:58 UTC