bootstrapping decentralized sparql from Sandro Hawke on 2009-05-16 (public-lod@w3.org from May 2009)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 15 May 2009 23:14:49 -0400
To: Kingsley Idehen <kidehen@openlinksw.com>
cc: "semantic-web@w3.org" <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>, David Huynh <dfhuynh@alum.mit.edu>
Message-ID: <5508.1242443689@ubehebe>

> > The interesting questions is can we have stateless SPARQL servers that
> > distribute the query to other SPARQL servers, and what metadata do they
> > need to do that well?  
> > I guess voiD is supposed to address that; I don't
> > know how well it does it, etc.  (I haven't had a chance to follow this
> > work much recently.)
> >   
> Yes, VoiD graphs cover that. The thing we need to do is standardize the 
> auto-discovery patterns so that smart federated SPARQL is feasible :-)
> 
> Example of a VoiD graph: http://lod.openlinksw.com/void/Dataset .

Thanks.  Yeah, I looked at VoiD, briefly, after we talked about it
Tuesday, although I don't fully understand it.

But I think I'm picturing something a little different.  (I think.)  The
key part I'm imagining is back-links (or track-backs).  I think folks
who publish ontologies ought, generally, to keep track (on a voluntary,
automatic, delegated basis) of who is using them.

For example, I suggest the RDF graph at "http://xmlns.com/foaf/0.1/"
(which introduces all the FOAF terms) should include some triples like:
      <> rx:tracker <http://tracker.example.com>
      <> rx:tracker <http://tracker-2.example.com>

... and those two trackers should be (REST) services where folks can
report a graph which uses a (specific) FOAF term and folks also query
for graphs which use a (specific) FOAF term.  It's a bit like
PingTheSemanticWeb or Sindice, but decentralized based on the ontologies
used.

Obviously there are some scaling details to work out, but my sense is
it's generally doable.  It may be that some terms (like rdf:type) are
too common to be worth indexing.  And some sites will have complex,
dynamic graph structures and will want to make sure they are registered
properly.  (For instance, livejournal should probably register one
SPARQL endpoint instead of its 10+ million dynamically-generated foaf
files.)

The result here will be that a query for a foaf:Person with a
foaf:firstName of "Sandro" can be *complete*, at least across all graphs
which choose to register themselves as having data about instances of
the foaf:Person class and triples using the foaf:firstName property.

I think running the tracker for an ontology should fundamentally be the
responsibility of the ontology hoster/maintainers (eg Dan and Libby for
FAOF), although I would expect there to be public tracking services, so
all they really have to do is sign up with one or more and point at them
with some rx:tracker triples.

(My apologies if someone has already proposed this, or even built it.  I
can't come close to following everything going on.)

      -- Sandro

Received on Saturday, 16 May 2009 03:15:24 UTC