Re: Follow up on our conference call on 7/11... from Kingsley Idehen on 2008-07-18 (public-xg-rdb2rdf@w3.org from July 2008)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 18 Jul 2008 09:02:23 -0400
To: "Ezzat, Ahmed" <Ahmed.Ezzat@hp.com>
CC: "ashok.malhotra@oracle.com" <ashok.malhotra@oracle.com>, "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>
Message-ID: <4880945F.9010405@openlinksw.com>
Ezzat, Ahmed wrote:
>
> Hello Ashok,
>
> I agree with what you shared & suggested, and thanks to all earlier responses.
>
> The one thing still remaining is feeling for response time from the user point of view including domain/local ontology mapping and executing SPARQL query? Clearly it depends on how many data sources involved and the nature of the query, but with the existing implementations do we have some published numbers or white papers?
> Thanks,
>
> Ahmed
>   

Ahmed,

On our part we will soon publish TPC-H numbers for Virtuoso covering SQL 
vs SQL-RDF vs RDF.  In the meantime, we do have live examples of 
Virtuoso RDF Views based TPC-H to RDF (in Linked Data form)  mappings 
[1]. We also have a live database instance that anyone can play around 
with (been there for a very long time) [2].

TPC-H is (as you know) an industry standard benchmark with an openly 
accessible SQL Schema. Thus, from our report you will get 2 for 1:

1) RDB to RDF mappings for the TPC-H SQL Schema
2) RDF Views (or Covers)
3) Data to extrapolate from

Links:

1. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFViewTPCH
2. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSTPCHLinkedData


Kingsley
>
> -----Original Message-----
> From: ashok malhotra [mailto:ashok.malhotra@oracle.com]
> Sent: Thursday, July 17, 2008 9:35 AM
> To: Ezzat, Ahmed
> Cc: public-xg-rdb2rdf@w3.org
> Subject: Re: Follow up on our conference call on 7/11...
>
> Hello Ahmed:
> Thank you for starting this thread.
>
> In my view, there are situations where you want to translate the data to
> RDF and then store it and query it
> but, if the data is very large and/or changes frequently a better
> approach is to leave the data in the native
> database and create a virtual RDF representation for it -- which I call
> a semantic cover. The semantic
> cover can then be queried with SPARQL and the SPARQL queries translated
> to queries over the native
> databases.
>
> It should also be possible to enrich the semantic cover with additional
> semantics but exactly how this would
> be done needs to be worked out.
>
> A recommendation that our XG may want to make to the W3C is to start
> work on a language that would map
> relational data to RDF. The mapping may be used to translate the data to
> RDF and store it in a RDF database
> or it could be used to create a virtual mapping as discussed above.
>
> We have heard a number of presentations on quick default mappings of
> Relational data to RDF. But we also
> need the ability to customize these mappings and add additional semantics.
>
> This approach starts with the Relational database schema. An alternative
> approach may be to create an ontology
> first and then create (distributed) SQL queries to answer questions
> about the ontologies.
>
> Ahmed, does that cover what you had in mind?
>
> All, please respond to this note so we can start coming to a shared
> understanding as to what we should
> recommend to the W3C.
>
> All the best, Ashok
>
>
> Ezzat, Ahmed wrote:
>   
>> Hello,
>> This is a question that I would be interested in hearing your reaction
>> and views about.
>> In a multiple data sources environment where some of them are huge
>> like data warehouses, it seems like transforming all data sources into
>> RDF then querying that RDF store using SPARQL is going to put too much
>> pressure on the RDF store beyond reasonable. In addition all changes
>> in these data sources need to be reflected in the RDF store as soon as
>> possible. In the above paragraph I am ignoring the notion of local and
>> domain Ontologies.
>> An alternative I am exploring is to decompose the user query into set
>> of subqueries (SQL and Search) operations to the relevant data sources
>> (i.e., context) à transform the results into RDF using local
>> Ontologies then resolve differences using the domain ontology à apply
>> the SPARQL query on the union of the RDF graphs after reconciliation.
>> Even this approach is far better from RDF storage point of view (i.e.,
>> scalability), it seems like response time can be less than desirable?
>> Comments and thoughts including additional alternatives...
>> Regards,
>> Ahmed
>> /*Ahmed K. Ezzat, Ph.D.*//* */
>> *HP Fellow*, *Business Intelligence Software Division
>> **Hewlett-Packard Corporation** *
>> 19333 Vallco Parkway, MS 4502, Cupertino, CA 95014-2599*
>> **Office*: *Email*: _Ahmed.Ezzat@hp.com_ <mailto:Ahmed.Ezzat@hp.com>
>> *Tel*: 408-285-6022 *Fax*: 408-285-1430
>> *Personal*: *Email*: _AhmedEzzat@aol.com_ <mailto:AhmedEzzat@aol.com>
>> *Tel*: 408-253-5062 *Fax*: 408-253-6271
>>
>> ------------------------------------------------------------------------
>>
>>     
>
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Friday, 18 July 2008 13:03:05 UTC