Re: bootstrapping decentralized sparql from Kingsley Idehen on 2009-05-16 (semantic-web@w3.org from May 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sat, 16 May 2009 07:05:45 -0400
To: Sandro Hawke <sandro@w3.org>
CC: "semantic-web@w3.org" <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>, David Huynh <dfhuynh@alum.mit.edu>
Message-ID: <4A0E9E09.2040405@openlinksw.com>

Sandro Hawke wrote:
>>> The interesting questions is can we have stateless SPARQL servers that
>>> distribute the query to other SPARQL servers, and what metadata do they
>>> need to do that well?  
>>> I guess voiD is supposed to address that; I don't
>>> know how well it does it, etc.  (I haven't had a chance to follow this
>>> work much recently.)
>>>   
>>>       
>> Yes, VoiD graphs cover that. The thing we need to do is standardize the 
>> auto-discovery patterns so that smart federated SPARQL is feasible :-)
>>
>> Example of a VoiD graph: http://lod.openlinksw.com/void/Dataset .
>>     
>
> Thanks.  Yeah, I looked at VoiD, briefly, after we talked about it
> Tuesday, although I don't fully understand it.
>
> But I think I'm picturing something a little different.  (I think.)  The
> key part I'm imagining is back-links (or track-backs).  I think folks
> who publish ontologies ought, generally, to keep track (on a voluntary,
> automatic, delegated basis) of who is using them.
>
> For example, I suggest the RDF graph at "http://xmlns.com/foaf/0.1/"
> (which introduces all the FOAF terms) should include some triples like:
>       <> rx:tracker <http://tracker.example.com>
>       <> rx:tracker <http://tracker-2.example.com>
>
> ... and those two trackers should be (REST) services where folks can
> report a graph which uses a (specific) FOAF term and folks also query
> for graphs which use a (specific) FOAF term.  It's a bit like
> PingTheSemanticWeb or Sindice, but decentralized based on the ontologies
> used.
>
> Obviously there are some scaling details to work out, but my sense is
> it's generally doable.  It may be that some terms (like rdf:type) are
> too common to be worth indexing.  And some sites will have complex,
> dynamic graph structures and will want to make sure they are registered
> properly.  (For instance, livejournal should probably register one
> SPARQL endpoint instead of its 10+ million dynamically-generated foaf
> files.)
>
> The result here will be that a query for a foaf:Person with a
> foaf:firstName of "Sandro" can be *complete*, at least across all graphs
> which choose to register themselves as having data about instances of
> the foaf:Person class and triples using the foaf:firstName property.
>
> I think running the tracker for an ontology should fundamentally be the
> responsibility of the ontology hoster/maintainers (eg Dan and Libby for
> FAOF), although I would expect there to be public tracking services, so
> all they really have to do is sign up with one or more and point at them
> with some rx:tracker triples.
>
> (My apologies if someone has already proposed this, or even built it.  I
> can't come close to following everything going on.)
>   
What you suggest and where this is ultimately heading are in sync. We 
just need the make a lose federation of SPARQL endpoints that expose 
stats about what they have, as part of the eventual solution. From this, 
we can build a federation of lookup and sync services (RDFSync protocol 
has been lying in wait for while now).  Thus, be rest assured that what 
you describe above will be part of the final solution, pre commencement 
of standardization process :-)

Also note SPARQL endpoints can be discovered via DNS [1].  We need to be 
able to discover, describe, and then sync stats across data spaces.

Links:

1. http://blogs.talis.com/nodalities/2009/04/discovering-sparql.php

Kingsley
>       -- Sandro
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com

Received on Saturday, 16 May 2009 11:06:25 UTC