Re: Best practice for exposing proprietary databases or services as SPARQL endpoints from Mr. Yrjana Rankka on 2010-12-16 (public-lod@w3.org from December 2010)

From: Mr. Yrjana Rankka <ghard@openlinksw.com>
Date: Thu, 16 Dec 2010 02:26:09 +0100
To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
CC: public-lod@w3.org, semantic-web@w3.org
Message-ID: <4D096AB1.2030900@openlinksw.com>
On 12/15/10 02:21 , Martin Hepp wrote:
> Dear all:
> Are there really no experiences beyond academic research regarding 
> this task? I had assumed it was a pretty standard requirement...
>
Martin,

We do have quite a lot of experience with this.

We've had RDF views over native Virtuoso tables for a few years already. 
In addition to that, you can use the virtual (federated) database 
functionality to:

1. Attach tables from various RDBMS (basically, any SQL DB one can 
connect to using ODBC, JDBC, etc.) into Virtuoso (non-open source feature)
2. Create RDF views, producing graphs from combined data sources, which 
are then exposed through our SPARQL endpoint

An example of a project I was involved with was an automobile 
manufacturer. This project combined two DB2, and one Oracle-based CRM 
databases, from which required tables were attached to Virtuoso.

I made SQL views of the attached tables within Virtuoso, to translate 
differences in schema and data representation of different CRM systems. 
These views were then used to produce a Virtuoso RDF view, making linked 
data and allowing SPARQL queries combining these 3 databases with 2 
different schemas transparently.

No data was ETL'd to intermediate storage. All data remained in their 
respective DBMSs and was automatically queried run-time. Though for 
certain types of queries, some of these views could, and should have 
been materialized in the RDF store for performance reasons.

Virtuoso has a feature, which allows setting resource limits to SPARQL 
queries. A partial result of a query can be returned, allowing exposure 
of large datasets without the risk of queries that run for ever DOSing 
your store. So you can let some users (or the general public) sample 
your data while allowing users with sufficient authorization to use more 
system resources.

Best regards,

Yrjänä

> Best
>
> Martin
>
> On 11.12.2010, at 09:33, Martin Hepp wrote:
>
>> Dear all:
>>
>> There are many different ways of exposing existing relational 
>> databases as SPARQL, e.g. as summarized by [1], namely Virtuoso's RDF 
>> Views, D2RQ, and Triplify.
>>
>> I am looking for best practices / recommendations for the following 
>> scenario:
>>
>> 1. There is a large and highly dynamic product or services database; 
>> part of the data (e.g. prices) may even come from external Web 
>> services (think of airfare, hotel prices).
>> 2. I want to make this accessible as a SPARQL endpoint using 
>> GoodRelations and FOAF.
>> 3. The mapping from the original data structures to the proper RDF 
>> must be hand-crafted anyway, so automation of this process is not 
>> important
>> 4. Creating RDF dumps is not feasible due to
>>
>> - the dynamics of the data
>> - the combinatorial complexity (not all combinations may be 
>> materialized in the database; think of product variants).
>>
>> Key requirements for me are:
>>
>> 1. Maturity of the software (alpha / beta releases are no option)
>> 2. Scalability - the SPARQL endpoint must handle tens of thousands of 
>> request per hour
>> 3. Resource management for the endpoint - it must be possible to 
>> protect the SPARQL endpoint from costly queries and return just a 
>> subset or refuse a query
>> 4. Resource management for the underlying RDBMS or Web services - it 
>> must be possible to protect the original RDBMS and involved Web 
>> services from excessive traffic (both willful ("Semantic DDoS") and 
>> unintentional (PhD students' Python scrips gone wild).
>>
>> What would you recommend? My main point is really: Which tools / 
>> architecture would you recommend if failure is not an option?
>>
>> Thanks for any opinions!
>>
>>
>> Best
>>
>> Martin
>>
>> [1] A Survey of Current Approaches for  Mapping of Relational 
>> Databases to RDF (PDF), Satya S. Sahoo, Wolfgang Halb, Sebastian 
>> Hellmann, Kingsley Idehen, Ted Thibodeau Jr, Sören Auer, Juan 
>> Sequeda, Ahmed Ezzat, 2009-01-31.
>> http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf
>>
>> --------------------------------------------------------
>> martin hepp
>> e-business & web science research group
>> universitaet der bundeswehr muenchen
>>
>> e-mail:  hepp@ebusiness-unibw.org
>> phone:   +49-(0)89-6004-4217
>> fax:     +49-(0)89-6004-4620
>> www:     http://www.unibw.de/ebusiness/ (group)
>>         http://www.heppnetz.de/ (personal)
>> skype:   mfhepp
>> twitter: mfhepp
>>
>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>> =================================================================
>> * Project Main Page: http://purl.org/goodrelations/
>> * Quickstart Guide for Developers: http://bit.ly/quickstart4gr
>> * Vocabulary Reference: http://purl.org/goodrelations/v1
>> * Developer's Wiki: http://www.ebusiness-unibw.org/wiki/GoodRelations
>> * Examples: http://bit.ly/cookbook4gr
>> * Presentations: http://bit.ly/grtalks
>> * Videos: http://bit.ly/grvideos
>>
>>
>
>


-- 
Mr. Yrjana Rankka        | ghard@openlinksw.com
Developer, Virtuoso Team | http://www.openlinksw.com
                          | Making Technology Work For You
Received on Thursday, 16 December 2010 01:27:54 UTC