- From: Nicolas Chauvat <nicolas.chauvat@logilab.fr>
- Date: Fri, 8 May 2009 11:06:44 +0200
- To: Nicolas Raoul <nicolas.raoul.lists@gmail.com>
- Cc: Paul Gearon <gearon@ieee.org>, semantic-web@w3.org
Hi Nicolas, On Tue, May 05, 2009 at 10:27:43PM +0900, Nicolas Raoul wrote: > My dream is: > > 1) I configure my "sparqldream" software to use dbpedia, freebase, and > various big and frequently updated triplestores. > 2) I run any SPARQL query on sparqldream. > 3) sparqldream does whatever it needs to, and returns the result of my > query, based on the most up-to-date information found in the > configured triplestores, as if I had instantly copied all of them into > a single local triplestore. > > Does any such software exist? > Or anything a bit similar? I will describe what we have with http://www.cubicweb.org and let you decide wether it is similar to your dream or not. The CubicWeb framework is made of two parts: the data engine and the web engine that communicate via RQL[1]. The data engine wraps data sources that can be of different types, including SQL, LDAP, RQL, subversion, mercurial. Links can traverse sources' boundaries. For example a user stored in LDAP can be linked to a document stored in subversion (this link is stored in the primary SQL source which is required). You could then do 'Any P,D WHERE P author_of D' with the data for P and D being stored in different sources. A source can be another cubicweb data engine queriable via RQL. The configuration of the source defines a "window" on the data. For example http://www.logilab.org is our external forge. We also have an internal forge on our intranet. This internal forge views the external forge as a source of projects and versions. Other entities present in the external forge do not appear in the internal forge. Here is an excerpt of the internal forge config file named sources: [external-forge] adapter=pyrorql pyro-ns-id=logilaborg pyro-ns-host=dmzserver mapping-file=mapping_internal_dmz.py cubicweb-user=someuser cubicweb-password=itspassword base-url=http://www.logilab.org/ and the mapping_internal_dmz file: support_entities = {'Project': True, 'Version': True, 'State': True} support_relations = {'in_state': True, 'version_of':True} dont_cross_relations = set(('concerns', 'done_in')) The data engine then takes care of the rest: discovering new objects, removing references to old objects, caching, etc. The client querying the data engine is not aware of the sources. It can send a query like "Any P WHERE P is Project" and get the list of all projects. Thanks to the base-url parameter above, each project will have its canonical url, though. Things that are both on our todo-list and our vaporware-list at the moment are adding other types of sources including SPARQL of course. I stop here in order not to spam the list. Please ask for more details if your are interested. - 1: back when we started the cubicweb project in 2001, there was no such thing as SPARQL. RQL is very similar to SPARQL and we are in the process of impementing SPARQL in CubicWeb. Hopefully it will work before summer. -- Nicolas Chauvat logilab.fr - services en informatique scientifique et gestion de connaissances
Received on Friday, 8 May 2009 09:07:20 UTC