RE: Reqirement 3.5: subgraph results from Seaborne, Andy on 2004-05-05 (public-rdf-dawg@w3.org from April to June 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 5 May 2004 13:19:47 +0100
To: Rob Shearer <Rob.Shearer@networkinference.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E808028A3406@0-mail-br1.hpl.hp.com>
-------- Original Message --------
> From: public-rdf-dawg-request@w3.org <>
> Date: 4 May 2004 20:57
> 
> I very strongly object to this requirement. There are a number of
> different reasons. 
> 
> First, the suggestion that a query should return a subgraph of the
> original RDF which can then be the subject of the same query (and return
> the exact same thing) is a MAJOR MAJOR constraint on what the query
> language can express. 

This requirement does not mean that it is the only way of getting at
results.  It may be one of several.  I see this requirement as the
complement to the requirement "3.2 Variable Binding Results".

> It eliminates any chance of using any information
> other than that encoded in triples to answer the query; in fact it
> codifies that any such extension is explicitly illegal. Queries along
> the lines of "must these two nodes be related via either of these two
> properties?" suddenly become impossible to answer when that answer is
> derived via inference, or rules, or some higher-level
> semantic language.
> The answer is simply "yes"; producing an RDF subgraph to justify that
> answer is impossible: the reasons have no canonical RDF representation.
> We should keep in mind that this requirement could completely dictate
> many other aspects of the language.
> 
> Second, just what use case is driving this requirement? If you really
> think this requirement is crucial, then feel free to contrive some
> examples. The explicit use case codified in the requirement--that of
> performing exactly the same query against its own result--is obviously
> useless and relevent to no sane user.

The problem spaces described in use cases 2.4 (Monitoring News Events) and
2.5 (Avoiding Traffic Jams) could have a query being executed regularly by
some device and the results passed onto a 3rd system for further processing.

Example from 2.4 (Monitoring News Events): the query is regularly executed
by some piece of software that collects the results as RSS items from a
large RSS database and creates a customized RSS feed.  This turned into
XHTML for the index page when asked (maybe it show items in next hour, maybe
its show all today's items - it does not know until display time).  The
same, customized RSS feed could also program digital recorder to actually
record the program - a different end-application usage.

Having the intermediate information encoded in RDF, rather than a
device-specific format, enables reuse of software amongst other things.  The
"web" in semantic web.

> 
> Third, it seems pretty clear that similar functionality to what is
> described here is easily built by client code, and to the
> exact level of
> detail the user truly desires. When a user performs a query, they know
> what kinds of triples they consider relevent, and provided
> some kind of
> node-binding result format is provided they can put together the
> explicit triples which model the data they were after. Why are we
> bloating the core query language with a feature that will be used
> seldom, and almost certainly won't offer the flexibility
> users need when
> they really do use it?
> 
> Finally, the whole notion of returning RDF subgraphs just
> strikes me as
> philosophically wrong. RDF navel-gazing is a fascinating academic
> pursuit, but real users are NOT interesting in just screwing
> around with
> RDF. They want real answers to real questions. Like it or not, XSLT is
> at best a peripheral XML technology, and it's very clear that
> it is not
> a viable general-purpose "query language". The most widely
> deployed XML
> query language is: SAX. Or maybe DOM. Then comes XPath. None of these
> things generates XML--they traverse the XML bits to get at
> the core data
> underneath. If the only thing you could ever do with XML was turn it
> into another bit of XML then the technology would be
> completely useless.
> If we do nothing but change RDF into more RDF then we haven't addressed
> the core problem which prevents RDF from being used in practice. We
> still haven't provided a standard way to actually get data *out* of it.

We do need a way to get data out but it is not the only need.  I think we
should cover the "get data out" case.  It could be argued that the WG should
provide a building block for this with just some kind of RDF=>RDF process -
I am unconvinced by that argument and I haven't heard it argued.  But the
"get data out" is not the only usage we should cover - extracting and
passing on to other systems also matters.

	Andy
Received on Wednesday, 5 May 2004 08:20:12 UTC