RE: Compiling information from several different triplestores from Eric Schoonover on 2009-05-05 (semantic-web@w3.org from May 2009)

From: Eric Schoonover <Eric.Schoonover@microsoft.com>
Date: Tue, 5 May 2009 08:52:24 -0700
To: Geoff Chappell <geoff@sover.net>, 'Nicolas Raoul' <nicolas.raoul.lists@gmail.com>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <ECE87C88E1B7914FB014538FDE94EA1D54921CFA7A@NA-EXMSG-C104.redmond.corp.microsoft>

I have effectively used Geoff's SDK to do federation with rules across multiple linked data endpoints.

I created a .NET object that extended the SDK to support interaction with standards based SPARQL endpoints and was able to plug that into Intellidimension Semantics SDK.  I think what I have described/achieved is exactly what Nicolas is looking for... being able to federate standard SPARQL queries across standard SPARQL endpoints.

The biggest challenge I had was getting some reasonable statistics for the target endpoints.

This functionality exists today and can be implemented with minimal complexity.  If anyone wants the SPARQL endpoint object that I developed for Semantics SDK let me know and I will provide it.

Thanks,
Eric Schoonover

-----Original Message-----
From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On Behalf Of Geoff Chappell
Sent: Tuesday, May 05, 2009 8:09 AM
To: 'Nicolas Raoul'; semantic-web@w3.org
Subject: RE: Compiling information from several different triplestores

Hi Nicolas,

This sounds like a typical federation plus rules problem. A federating query
processor will decompose the query into smaller queries against the
specified graphs -- effectively querying against the union of the graphs
without performing an actual costly union. Rules provide any necessary
ontology mapping. Assuming the rules are processed in a backwards chaining
manner, they will only be evaluated as necessary to answer the query so as
to minimize expense.

Here's an example of what I'm talking about using our Semantics.SDK for
..NET[1]: 

	prefix owl: <...>
	prefix ex: <...>
	prefix foaf: <...>

	#sparql extension to support rules
	rulebase (
		construct {?s ?p ?o} from {?x owl:sameAs ?s. ?x ?p ?o}
	)

	select ?f 
	from <http://www.someplace.org/data>
	from <http://www.someotherplace.org/data>
	where { ex:Anthony foaf:knows ?f }


To do this efficiently, the query processor will need statistics for the
data sources used. For remote graphs (e.g. sparql endpoints) this means that
they either need to publish stats in a reasonable form, or the query
processor would have to generate and cache its own based upon queries
against the graph. 

Hope that helps,

Geoff

[1] http://www.intellidimension.com/products/semantics-sdk/

-----Original Message-----
From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On
Behalf Of Nicolas Raoul
Sent: Tuesday, May 05, 2009 6:33 AM
To: semantic-web@w3.org
Subject: Compiling information from several different triplestores

Hello all,

How can I run a query over several different triplestores ?

For instance, I want to get a list of Anthony's friends.
Triplestore1 says Jack is Tony's friend.
Triplestore2 says Tony sameAs Anthony.
What clever mechanism would undestand that Jack is Anthony's friend?
Do I have to copy all information from both triplestores my own
triplestore, or is there something smarter to do ?

Copying all information from external triplestores seems awkward, and
in some cases might prove impossible (frequent updates, size, load on
servers).
Is there an easy solution that I am not aware of?
Can any triplestore implementation be configured to complement its
information with information from external triplestores?

Thank you!
Nicolas Raoul
http://nrw.free.fr

Received on Tuesday, 5 May 2009 15:54:10 UTC