Re: Compiling information from several different triplestores from Peter Ansell on 2009-05-08 (semantic-web@w3.org from May 2009)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Fri, 8 May 2009 19:51:41 +1000
To: Nicolas Raoul <nicolas.raoul.lists@gmail.com>
Cc: Paul Gearon <gearon@ieee.org>, semantic-web@w3.org
Message-ID: <a1be7e0e0905080251v49523880r9d17024372993e7b@mail.gmail.com>

2009/5/5 Nicolas Raoul <nicolas.raoul.lists@gmail.com>:
> Thanks for the detailed answer!
>
> However, I am not supposed to know in advance that triplestore1
> contains hasFriend, and triplestore2 contains sameAs, it could be the
> other way, or even two sameAs over 3 triplestores. I would be really
> glad if there were a more general mechanism.
>
> My dream is:
>
> 1) I configure my "sparqldream" software to use dbpedia, freebase, and
> various big and frequently updated triplestores.
> 2) I run any SPARQL query on sparqldream.
> 3) sparqldream does whatever it needs to, and returns the result of my
> query, based on the most up-to-date information found in the
> configured triplestores, as if I had instantly copied all of them into
> a single local triplestore.
>
> Does any such software exist?
> Or anything a bit similar?

The Bio2RDF software does part of what you want [1]. It doesn't run
the SPARQL queries, but it interprets URL's to create SPARQL queries
which are then executed in parallel on different endpoints. When you
combine that with the embedded DERI Pipes engine, the results can be
collated and fed into other queries or you can run the SPARQL query
you desire on the data you have locally as long as you can store the
relevant information in memory locally I guess...

In its current configuration, it doesn't distribute queries based on
predicates as you desire, it distributed queries based on URI
prefixes. The "prefixes" are not always simple prefixes, they can be
transformed as needed. If it knows that desired types of information
exist in more than one endpoint it queries them both with the same
query. The datasets I am using didn't suit themselves to basing things
on predicates, but you could manually create queries which utilise
predicates without changing the model it is based on, you would just
then distribute the queries which had the predicates in them to the
relevant endpoints. It does work based on preconfiguration either way
though, so you would have to do some exploring to discover the
relevant parts, or create something automagically to pick out the
pieces you want.

It can be configured to use another host name btw, so it is not
hardcoded with "http://bio2rdf.org/".

Cheers,

Peter

[1] https://sourceforge.net/project/platformdownload.php?group_id=142631

Received on Friday, 8 May 2009 10:02:47 UTC