Summary of Virtuoso Presentation Points (fwd) from hhalpin@w3.org on 2009-10-28 (public-rdb2rdf-wg@w3.org from October 2009)

From: <hhalpin@w3.org>
Date: Wed, 28 Oct 2009 19:43:05 -0400 (EDT)
To: public-rdb2rdf-wg@w3.org
Message-ID: <Pine.LNX.4.64.0910281941440.14279@homer.w3.org>

Send on Orri's behalf.

---------- Forwarded message ----------
Date: Mon, 26 Oct 2009 11:08:46 +0000
From: Orri Erling <erling@xs4all.nl>
To: public-rdb2rdf-wg@w3.org
Subject: [Moderator Action] Summary of Virtuoso Presentation Points

Colleagues

To complement my presentation in the previous telco, I'd like to summarize a
few points.

For business intelligence over large warehouses, extracting the data and
loading it as RDF is prohibitively expensive.  Mapping on the fly is needed
for capitalizing on existing assets.

RDF databasing will evolve in terms of data compression and query
throughput, thus the relative cost of RDF compared to  relational column
stores will come down.  Still, where there is a relational warehouse with
all related processes, duplication of data should not be the first choice.
Mapping on the fly should be the approach of choice when:

- The number of integrable databases is limited and there is good
connectivity to these.

- The inference required on top of the relational data is not very complex,
i.e. is limited to terminology mapping, subclasses and subproperties.

Right now, most RDF-based integration extracts and loads RDF  data into an
RDF store.

Making mappings for ETL and making mappings for on the fly SPARQL to SQL
transformation present different requirements

The reason for this is that mapping a row into triples is simple but mapping
from the resulting  URI's back to the row may not be so simple.  In the most
basic case, a row becomes one triple per column, so that the primary key
gives the subject, each columnn value gives an object and each column name
gives a predicate.   Then, if multiple mappings may produce identical
looking URI's the logic needs to construct unions.  This may be avoided by
inferring that two mappings cannot produce overlapping URI's.  Such
processes sometimes require extra annotations in the mapping.  Supporting
extraction only is a small fraction of the implementation of generic on the
fly SPARQL to SQL mapping.

Both in the case of primary and foreign keys, there is a mapping of one or
more column values into a URI.  Declaring such things only once should be
supported.

If these criteria can be met inside a W3C recommendation and we can take a
standard relational use case like TPC H and map it to RDF and then query
with SPARQL 1.1, covering the 22 queries of TPC H, with at least two
interoperable implementations, the WG is a success.  This should be
possible, since this level of functionality is already implemented in
Virtuoso.  The questions to resolve in the WG will have to do with syntax
and selection of validation use cases.  There will also be need for
extensibility mechanisms for supporting things like full text and spatial
operations present in many SQL implementations.

I am looking forward to getting into the details of all this.

Regards

Orri

Received on Wednesday, 28 October 2009 23:43:14 UTC