RE: Follow up on our conference call on 7/11... from Ezzat, Ahmed on 2008-07-17 (public-xg-rdb2rdf@w3.org from July 2008)

From: Ezzat, Ahmed <Ahmed.Ezzat@hp.com>
Date: Thu, 17 Jul 2008 06:44:18 +0000
To: Kingsley Idehen <kidehen@openlinksw.com>
CC: "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>
Message-ID: <3B7AE9BA67C72B4891EF21842246A21C358513877D@GVW1097EXB.americas.hpqcorp.net>
Do you have papers describing your Virtuoso Cluster Edition RDF views of SQL data and your use of SQL optimization heuristics to deliver high-performance and scalable RDF Views of SQL Data.

Not clear what do you mean by creating RDF views of the SQL data?  It sounds like you are materializing RDF in the SQL engine? Are you creating views to the result of the SQL query execution?  If you do, then it looks like a flavor of my 2nd proposition.  I agree materializing the whole warehouse is not the first choice.  If not, then I would like to read more about your approach....

In your environment, do you support multiple data sources and do you go through the local/domain ontologies reconciliation and apply SPARQL?  If you do all of that how is the performance from response time point of view? Do you have experiments that the user query is executed against multiple data sources and have response time numbers?

It would be great if you can send some papers regarding your approach and some paper about performance and I can follow up with you after reading the material - thanks in advance...
Regards,

Ahmed



-----Original Message-----
From: Kingsley Idehen [mailto:kidehen@openlinksw.com]
Sent: Wednesday, July 16, 2008 9:50 PM
To: Ezzat, Ahmed
Cc: public-xg-rdb2rdf@w3.org
Subject: Re: Follow up on our conference call on 7/11...

Ezzat, Ahmed wrote:
> I am not up to speed to what Virtuoso do, i.e., I do not know if what Virtuoso do will work in my scenario.
>
> But a data warehouse in our environment is 100+ TB which would be considered one data source in the enterprise. Do you see converting that size of data into RDF (i.e., as described in my first approach) as viable?
>
It can be converted, this is a data center matter if warehousing is the
ultimate solution. But, I wouldn't take the warehousing route if I can
create RDF Views of the SQL Data :-)  Our RDB to RDF mapping is all
about using SQL optimization heuristics to deliver high-performance and
scalable RDF Views of SQL Data.

I am confident with an appropriately configured data center plus
Virtuoso  Cluster Edition using RDF Views or RDF warehousing your
challenge is addressable. In our tests with the TPC-H benchmark, we've
been able to get RDF Views to outperform RDF warehousing, so warehousing
is purely a last resort option at best.

Kinglsey
> Ahmed
>
> -----Original Message-----
> From: Kingsley Idehen [mailto:kidehen@openlinksw.com]
> Sent: Wednesday, July 16, 2008 7:16 PM
> To: Ezzat, Ahmed
> Cc: public-xg-rdb2rdf@w3.org
> Subject: Re: Follow up on our conference call on 7/11...
>
> Ezzat, Ahmed wrote:
>
>> Hello,
>> This is a question that I would be interested in hearing your reaction
>> and views about.
>> In a multiple data sources environment where some of them are huge
>> like data warehouses, it seems like transforming all data sources into
>> RDF then querying that RDF store using SPARQL is going to put too much
>> pressure on the RDF store beyond reasonable. In addition all changes
>> in these data sources need to be reflected in the RDF store as soon as
>> possible. In the above paragraph I am ignoring the notion of local and
>> domain Ontologies.
>> An alternative I am exploring is to decompose the user query into set
>> of subqueries (SQL and Search) operations to the relevant data sources
>> (i.e., context) à transform the results into RDF using local
>> Ontologies then resolve differences using the domain ontology à apply
>> the SPARQL query on the union of the RDF graphs after reconciliation.
>> Even this approach is far better from RDF storage point of view (i.e.,
>> scalability), it seems like response time can be less than desirable?
>> Comments and thoughts including additional alternatives...
>>
> Ezzat,
>
> All I can say without additional detail is that shouldn't jump to
> conclusions about the scalability of RDF engines re. the warehousing
> approach or the sophistication of SQL optimizers when injected into the
> SQL-RDF mapping realm.
>
> Virtuoso offers solutions for the RDF warehousing and RDF Views
> approaches. I am certainly happy to be proven wrong via experimentation
> re. Virtuoso's ability to handle either approach without compromising
> performance or scalability.
>
> Virtuoso has been designed and engineered to handle heavy duty RDF data
> management (physical or virtual) from the get go.
>
> Please provide me with additional details about database counts and
> sizes etc..
>
>
> Kingsley
>
>
>> Regards,
>> Ahmed
>> /*Ahmed K. Ezzat, Ph.D.*//* */
>> *HP Fellow*, *Business Intelligence Software Division
>> **Hewlett-Packard Corporation** *
>> 19333 Vallco Parkway, MS 4502, Cupertino, CA 95014-2599*
>> **Office*: *Email*: _Ahmed.Ezzat@hp.com_ <mailto:Ahmed.Ezzat@hp.com>
>> *Tel*: 408-285-6022 *Fax*: 408-285-1430
>> *Personal*: *Email*: _AhmedEzzat@aol.com_ <mailto:AhmedEzzat@aol.com>
>> *Tel*: 408-253-5062 *Fax*: 408-253-6271
>>
>> ------------------------------------------------------------------------
>>
>>
>
>
> --
>
>
> Regards,
>
> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
> President & CEO
> OpenLink Software     Web: http://www.openlinksw.com
>
>
>
>
>
>


--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software     Web: http://www.openlinksw.com
Received on Thursday, 17 July 2008 06:45:57 UTC