RE: Summary of Virtuoso Presentation Points (fwd)

Orri,

So far so good.  I would like to see performance goal as an additional consideration in the on-demand scenario.

Ahmed

-----Original Message-----
From: public-rdb2rdf-wg-request@w3.org [mailto:public-rdb2rdf-wg-request@w3.org] On Behalf Of hhalpin@w3.org
Sent: Wednesday, October 28, 2009 4:43 PM
To: public-rdb2rdf-wg@w3.org
Subject: Summary of Virtuoso Presentation Points (fwd)

Send on Orri's behalf.

---------- Forwarded message ----------
Date: Mon, 26 Oct 2009 11:08:46 +0000
From: Orri Erling <erling@xs4all.nl>
To: public-rdb2rdf-wg@w3.org
Subject: [Moderator Action] Summary of Virtuoso Presentation Points

Colleagues

To complement my presentation in the previous telco, I'd like to summarize a few points.

For business intelligence over large warehouses, extracting the data and loading it as RDF is prohibitively expensive.  Mapping on the fly is needed for capitalizing on existing assets.

RDF databasing will evolve in terms of data compression and query throughput, thus the relative cost of RDF compared to  relational column stores will come down.  Still, where there is a relational warehouse with all related processes, duplication of data should not be the first choice.
Mapping on the fly should be the approach of choice when:

- The number of integrable databases is limited and there is good connectivity to these.

- The inference required on top of the relational data is not very complex, i.e. is limited to terminology mapping, subclasses and subproperties.

Right now, most RDF-based integration extracts and loads RDF  data into an RDF store.

Making mappings for ETL and making mappings for on the fly SPARQL to SQL transformation present different requirements

The reason for this is that mapping a row into triples is simple but mapping from the resulting  URI's back to the row may not be so simple.  In the most basic case, a row becomes one triple per column, so that the primary key gives the subject, each columnn value gives an object and each column name
gives a predicate.   Then, if multiple mappings may produce identical looking URI's the logic needs to construct unions.  This may be avoided by inferring that two mappings cannot produce overlapping URI's.  Such processes sometimes require extra annotations in the mapping.  Supporting extraction only is a small fraction of the implementation of generic on the fly SPARQL to SQL mapping.

Both in the case of primary and foreign keys, there is a mapping of one or more column values into a URI.  Declaring such things only once should be supported.

If these criteria can be met inside a W3C recommendation and we can take a standard relational use case like TPC H and map it to RDF and then query with SPARQL 1.1, covering the 22 queries of TPC H, with at least two interoperable implementations, the WG is a success.  This should be possible, since this level of functionality is already implemented in Virtuoso.  The questions to resolve in the WG will have to do with syntax and selection of validation use cases.  There will also be need for extensibility mechanisms for supporting things like full text and spatial operations present in many SQL implementations.

I am looking forward to getting into the details of all this.

Regards

Orri

Received on Thursday, 29 October 2009 00:21:21 UTC