RE: Distributed Query (was RDFStyles) from Leo Sauermann on 2003-10-22 (www-rdf-interest@w3.org from October 2003)

From: Leo Sauermann <leo@gnowsis.com>
Date: Wed, 22 Oct 2003 18:40:44 +0200
To: "'Sandro Hawke'" <sandro@w3.org>, <www-rdf-interest@w3.org>
Message-ID: <000c01c398bb$3d598890$0501a8c0@ZION>
Actually I am working on distributed queries.

so to say: distributed on a single host, but the idea is the same for N
hosts.

The principle i chose for my approach was that the host who has some
metadata about a URI can be identified by parsing the URI itself.

f.e. if my URI is labeled:

http://leo.gnowsis.com/~user/leo

I assumed that the host was leo.gnowsis.com.

Then at leo.gnowsis.com the host is contacted by a new protocol (f.e.
URIQA or Joseki) and PARSES the url, http is not used.
After parsing, the host knows which application/database to contact
about triples about the resource.


This approach is like a Apache server that has modules that can do .php
or .aspx includes or a Jetty Servlet server that has a web.xml file
where some Servlets are registerred for handling some URL pattern.

Exactly this is what I transfered to the gnowsis system,
www.gnowsis.com

f.e. there is an adapter that can handle MP3 file metadata and when i
want to know something about
file://leo.gnowsis.com/media/songs/u2-one.mp3
the server finds the url 
- being on localhost
- having "file" scheme
- ending with ".mp3"

and therefore passes the metadata request on to the Mp3 metadata
adapter.

Voila, with this approach I have shown how to build a distributed query
system very easy.

And yes, you are free to program a crawler-robot that follows all links
in the semantic web and indexes these google-like.

Will be published in January 2004.

greetings
Leo Sauermann
www.gnowsis.com

> -----Original Message-----
> From: www-rdf-interest-request@w3.org 
> [mailto:www-rdf-interest-request@w3.org] On Behalf Of Sandro Hawke
> Sent: Wednesday, October 22, 2003 3:48 PM
> To: www-rdf-interest@w3.org
> Subject: Distributed Query (was RDFStyles)
> 
> 
> 
> 
> > One more aspect of RDF that I notice is often forgotten is that it's
> > supposed to be distributed (see more below).
> 
> I'm not sure it's "supposed to" be distributed.   That's how you and I
> want to use it, since we want to use it for the Semantic Web.  Is that
> all RDF is good for?   Maybe....   Should there be separate RDF and
> Semantic Web lists?   Is the fact the the W3C "Metadata Activity" is
> gone and RDF is now part of the W3C "Semantic Web Activity" sufficient
> to prove RDF is really just for the Semantic Web?
> 
> Anyway, I don't think it's accidentally forgotten, just 
> postponed while
> people try to figure out the presumably simpler local issues.
> 
> > My favorite screw in need for screwdriver is RDF query (as 
> opposed to
> > RDF transformation): since RDF is really distributed, you are not
> > supposed to be able to process the whole problem domain 
> in-memory and on
> > a single host, you're rather supposed to _query_ different remote
> > knowledge bases and process _results_ of these queries.
> > 
> > Fetching whole of WordNet, Wikipedia, and DMoz and running an XSLT
> > transform on the combined result doesn't fit into the 
> original vision of
> > Semantic Web as I understand it.
> 
> I think this is a hard but wonderful problem.  Each RDF document has
> lots of URIs you can use as links to find more information.  Most
> queries you use will also contain URIs you could use.  There are two
> problems: (1) if you follow them all, recursively, you might soon end
> up with a billion pages [this is the "performance" question], and (2)
> not all of the information will be true [the "trust" question].
> There's some talk of this on the esw wiki under "Follow Links For More
> Information" [1]; I encouraged you to contribute.
> 
> I was recently exploring this in the context of my OWL Test Results
> page [2], trying to express in RDF which links the report generator
> should follow [3].  The idea is that one CAN follow any link, but
> metadata about what you'll find if you do will save you a lot of work.
> The metadata which struck me as useful was: what are the classes of
> the things named there and what are the properties used in the
> statements there.  (Use the most-specific subclass and subproperty
> which you know to be true.  Assume folks will follow links to the
> ontology so they'll know this.)  I constructed that file (start.rdf)
> by hand, but I'd expect it to be constructed by one agent to save all
> the other agents a lot of work, kind of like how Google saves each of
> us from having to read 3,307,998,701 web pages ourselves.
> 
> Meanwhile, I consider the trust issue completely orthogonal.  I hope
> to present all fetched results to users along with justification
> information, which shows both what sources were used and what kind of
> reasoning was used (a la inferenceWeb).  If an actual contradiction is
> detected, I expect to make some sort of truth maintenance decision and
> discard one source, with a warning to the user that the truth
> maintenance decision was just a guess.
> 
>       -- sandro
> 
> 
> [1] http://esw.w3.org/topic/FollowLinksForMoreInformation
> [2] http://www.w3.org/2003/08/owl-systems/test-results-out
> [3] http://www.w3.org/2003/08/owl-systems/start.rdf
> [4] http://www.ksl.stanford.edu/software/IW/
>
Received on Wednesday, 22 October 2003 12:35:45 UTC