Re: cubicweb and data.bnf.fr from Nicolas Chauvat on 2011-07-11 (public-lld@w3.org from July 2011)

From: Nicolas Chauvat <nicolas.chauvat@logilab.fr>
Date: Mon, 11 Jul 2011 09:42:05 +0200
To: William Waites <ww-dated-1310713199.58e9c2@styx.org>
Cc: public-lld@w3.org
Message-ID: <20110711074205.GB9242@volans.logilab.fr>

Hi,

On Sun, Jul 10, 2011 at 08:59:58AM +0200, William Waites wrote:
> I would be interested in reading a bit more about the implementation
> details, is the cubicweb system running on top of an existing database
> or is the data taken and transformed and stored (cached, more or less)
> in a dedicated database for cubicweb? If the latter, how are updates
> to the original data handled?

Thank you for your interest.

CubicWeb_ is made of two parts: the repository and the web engine.

The web engine queries the repository using RQL_ to get information
about the ontology and to get the data. A query returns a resultset,
that the web engine will display by applying a view. Some views
generate html, others produce rdf/xml or rdf/n3, others do pdf, png,
json, csv, excel and <you implement it>.

The repository reads/writes the data from/to at least one SQL database
to which it delegates the execution of queries, but has some kind of
database federation capability that allows to transparently merge
multiple sources of data: LDAP, mercurial version control system,
other cubicweb instances. That federation feature is not very
efficient at this point since it does not cache anything and keeps
doing the same remote queries over and over again. We prefer
Postgresql, but we know of production sites running CubicWeb on top of
SQLServer and MySQL. Sqlite is only good for running the test suite.

Exporting data to various formats is easy. Current work focuses on
ways to map_ datamodels and facilitate exporting/importing data to/from
RDF without writing more than a few lines to map internal data models
to a well known ontologies.

In the case of data.bnf.fr, the data is extracted from the existing
SQL database that stores the catalog, then transformed and stored in a
dedicated database for CubicWeb.

I will let the people that ran the project at the french national
library comment more on this as they wish, since a lot of interesting
details could be added on the data transformation process and the
set up that allows to scale and serve lots of data fast.

.. _CubicWeb: http://docs.cubicweb.org/intro/concepts.html
.. _RQL: http://docs.cubicweb.org/annexes/rql/language
.. _map: http://hg.logilab.org/cubicweb/file/64eee2a83bfa/xy.py

Hope this helps, any comment will be welcome.

-- 
Nicolas Chauvat

logilab.fr - services en informatique scientifique et gestion de connaissances

Received on Monday, 11 July 2011 07:42:31 UTC