Distributed Query (was RDFStyles) from Sandro Hawke on 2003-10-22 (www-rdf-interest@w3.org from October 2003)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 22 Oct 2003 09:47:47 -0400
To: www-rdf-interest@w3.org
Message-Id: <200310221347.h9MDlmMN004662@roke.hawke.org>
> One more aspect of RDF that I notice is often forgotten is that it's
> supposed to be distributed (see more below).

I'm not sure it's "supposed to" be distributed.   That's how you and I
want to use it, since we want to use it for the Semantic Web.  Is that
all RDF is good for?   Maybe....   Should there be separate RDF and
Semantic Web lists?   Is the fact the the W3C "Metadata Activity" is
gone and RDF is now part of the W3C "Semantic Web Activity" sufficient
to prove RDF is really just for the Semantic Web?

Anyway, I don't think it's accidentally forgotten, just postponed while
people try to figure out the presumably simpler local issues.

> My favorite screw in need for screwdriver is RDF query (as opposed to
> RDF transformation): since RDF is really distributed, you are not
> supposed to be able to process the whole problem domain in-memory and on
> a single host, you're rather supposed to _query_ different remote
> knowledge bases and process _results_ of these queries.
> 
> Fetching whole of WordNet, Wikipedia, and DMoz and running an XSLT
> transform on the combined result doesn't fit into the original vision of
> Semantic Web as I understand it.

I think this is a hard but wonderful problem.  Each RDF document has
lots of URIs you can use as links to find more information.  Most
queries you use will also contain URIs you could use.  There are two
problems: (1) if you follow them all, recursively, you might soon end
up with a billion pages [this is the "performance" question], and (2)
not all of the information will be true [the "trust" question].
There's some talk of this on the esw wiki under "Follow Links For More
Information" [1]; I encouraged you to contribute.

I was recently exploring this in the context of my OWL Test Results
page [2], trying to express in RDF which links the report generator
should follow [3].  The idea is that one CAN follow any link, but
metadata about what you'll find if you do will save you a lot of work.
The metadata which struck me as useful was: what are the classes of
the things named there and what are the properties used in the
statements there.  (Use the most-specific subclass and subproperty
which you know to be true.  Assume folks will follow links to the
ontology so they'll know this.)  I constructed that file (start.rdf)
by hand, but I'd expect it to be constructed by one agent to save all
the other agents a lot of work, kind of like how Google saves each of
us from having to read 3,307,998,701 web pages ourselves.

Meanwhile, I consider the trust issue completely orthogonal.  I hope
to present all fetched results to users along with justification
information, which shows both what sources were used and what kind of
reasoning was used (a la inferenceWeb).  If an actual contradiction is
detected, I expect to make some sort of truth maintenance decision and
discard one source, with a warning to the user that the truth
maintenance decision was just a guess.

      -- sandro


[1] http://esw.w3.org/topic/FollowLinksForMoreInformation
[2] http://www.w3.org/2003/08/owl-systems/test-results-out
[3] http://www.w3.org/2003/08/owl-systems/start.rdf
[4] http://www.ksl.stanford.edu/software/IW/
Received on Wednesday, 22 October 2003 09:51:09 UTC