Re: Best practice for exposing proprietary databases or services as SPARQL endpoints from Kingsley Idehen on 2010-12-16 (public-lod@w3.org from December 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 15 Dec 2010 21:09:31 -0500
To: public-lod@w3.org
CC: semantic-web@w3.org, "martin.hepp@ebusiness-unibw.org" <martin.hepp@ebusiness-unibw.org>
Message-ID: <4D0974DB.4080905@openlinksw.com>
On 12/15/10 8:26 PM, Mr. Yrjana Rankka wrote:
> On 12/15/10 02:21 , Martin Hepp wrote:
>> Dear all:
>> Are there really no experiences beyond academic research regarding 
>> this task? I had assumed it was a pretty standard requirement...
>>
> Martin,
>
> We do have quite a lot of experience with this.
>
> We've had RDF views over native Virtuoso tables for a few years 
> already. In addition to that, you can use the virtual (federated) 
> database functionality to:
>
> 1. Attach tables from various RDBMS (basically, any SQL DB one can 
> connect to using ODBC, JDBC, etc.) into Virtuoso (non-open source 
> feature)
> 2. Create RDF views, producing graphs from combined data sources, 
> which are then exposed through our SPARQL endpoint
>
> An example of a project I was involved with was an automobile 
> manufacturer. This project combined two DB2, and one Oracle-based CRM 
> databases, from which required tables were attached to Virtuoso.
>
> I made SQL views of the attached tables within Virtuoso, to translate 
> differences in schema and data representation of different CRM 
> systems. These views were then used to produce a Virtuoso RDF view, 
> making linked data and allowing SPARQL queries combining these 3 
> databases with 2 different schemas transparently.
>
> No data was ETL'd to intermediate storage. All data remained in their 
> respective DBMSs and was automatically queried run-time. Though for 
> certain types of queries, some of these views could, and should have 
> been materialized in the RDF store for performance reasons.
>
> Virtuoso has a feature, which allows setting resource limits to SPARQL 
> queries. A partial result of a query can be returned, allowing 
> exposure of large datasets without the risk of queries that run for 
> ever DOSing your store. So you can let some users (or the general 
> public) sample your data while allowing users with sufficient 
> authorization to use more system resources.

In addition to the above, which describes transient views, we also have 
fully materialized views that handle change-sensitivity via delta syncs 
between the native quad store and one or more ODBC or JDBC accessible 
data sources names. Thus, we also offer full faceted navigation atop 
virtual RDBMS data sources.

Naturally, you can use Pivot Viewer as a high-level tool for 
sophisticated drill-down style interaction with Virtuoso's transient of 
materialized RDF views.

Kingsley
>
> Best regards,
>
> Yrjänä
>
>> Best
>>
>> Martin
>>
>> On 11.12.2010, at 09:33, Martin Hepp wrote:
>>
>>> Dear all:
>>>
>>> There are many different ways of exposing existing relational 
>>> databases as SPARQL, e.g. as summarized by [1], namely Virtuoso's 
>>> RDF Views, D2RQ, and Triplify.
>>>
>>> I am looking for best practices / recommendations for the following 
>>> scenario:
>>>
>>> 1. There is a large and highly dynamic product or services database; 
>>> part of the data (e.g. prices) may even come from external Web 
>>> services (think of airfare, hotel prices).
>>> 2. I want to make this accessible as a SPARQL endpoint using 
>>> GoodRelations and FOAF.
>>> 3. The mapping from the original data structures to the proper RDF 
>>> must be hand-crafted anyway, so automation of this process is not 
>>> important
>>> 4. Creating RDF dumps is not feasible due to
>>>
>>> - the dynamics of the data
>>> - the combinatorial complexity (not all combinations may be 
>>> materialized in the database; think of product variants).
>>>
>>> Key requirements for me are:
>>>
>>> 1. Maturity of the software (alpha / beta releases are no option)
>>> 2. Scalability - the SPARQL endpoint must handle tens of thousands 
>>> of request per hour
>>> 3. Resource management for the endpoint - it must be possible to 
>>> protect the SPARQL endpoint from costly queries and return just a 
>>> subset or refuse a query
>>> 4. Resource management for the underlying RDBMS or Web services - it 
>>> must be possible to protect the original RDBMS and involved Web 
>>> services from excessive traffic (both willful ("Semantic DDoS") and 
>>> unintentional (PhD students' Python scrips gone wild).
>>>
>>> What would you recommend? My main point is really: Which tools / 
>>> architecture would you recommend if failure is not an option?
>>>
>>> Thanks for any opinions!
>>>
>>>
>>> Best
>>>
>>> Martin
>>>
>>> [1] A Survey of Current Approaches for  Mapping of Relational 
>>> Databases to RDF (PDF), Satya S. Sahoo, Wolfgang Halb, Sebastian 
>>> Hellmann, Kingsley Idehen, Ted Thibodeau Jr, Sören Auer, Juan 
>>> Sequeda, Ahmed Ezzat, 2009-01-31.
>>> http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf
>>>
>>> --------------------------------------------------------
>>> martin hepp
>>> e-business & web science research group
>>> universitaet der bundeswehr muenchen
>>>
>>> e-mail:  hepp@ebusiness-unibw.org
>>> phone:   +49-(0)89-6004-4217
>>> fax:     +49-(0)89-6004-4620
>>> www:     http://www.unibw.de/ebusiness/ (group)
>>>         http://www.heppnetz.de/ (personal)
>>> skype:   mfhepp
>>> twitter: mfhepp
>>>
>>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>>> =================================================================
>>> * Project Main Page: http://purl.org/goodrelations/
>>> * Quickstart Guide for Developers: http://bit.ly/quickstart4gr
>>> * Vocabulary Reference: http://purl.org/goodrelations/v1
>>> * Developer's Wiki: http://www.ebusiness-unibw.org/wiki/GoodRelations
>>> * Examples: http://bit.ly/cookbook4gr
>>> * Presentations: http://bit.ly/grtalks
>>> * Videos: http://bit.ly/grvideos
>>>
>>>
>>
>>
>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Thursday, 16 December 2010 02:11:23 UTC