RE: Follow up on our conference call on 7/11... from Ezzat, Ahmed on 2008-07-17 (public-xg-rdb2rdf@w3.org from July 2008)

From: Ezzat, Ahmed <Ahmed.Ezzat@hp.com>
Date: Thu, 17 Jul 2008 04:31:22 +0000
To: Kingsley Idehen <kidehen@openlinksw.com>
CC: "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>
Message-ID: <3B7AE9BA67C72B4891EF21842246A21C358513876E@GVW1097EXB.americas.hpqcorp.net>

I am not up to speed to what Virtuoso do, i.e., I do not know if what Virtuoso do will work in my scenario.

But a data warehouse in our environment is 100+ TB which would be considered one data source in the enterprise. Do you see converting that size of data into RDF (i.e., as described in my first approach) as viable?

Ahmed

-----Original Message-----
From: Kingsley Idehen [mailto:kidehen@openlinksw.com]
Sent: Wednesday, July 16, 2008 7:16 PM
To: Ezzat, Ahmed
Cc: public-xg-rdb2rdf@w3.org
Subject: Re: Follow up on our conference call on 7/11...

Ezzat, Ahmed wrote:
> Hello,
> This is a question that I would be interested in hearing your reaction
> and views about.
> In a multiple data sources environment where some of them are huge
> like data warehouses, it seems like transforming all data sources into
> RDF then querying that RDF store using SPARQL is going to put too much
> pressure on the RDF store beyond reasonable. In addition all changes
> in these data sources need to be reflected in the RDF store as soon as
> possible. In the above paragraph I am ignoring the notion of local and
> domain Ontologies.
> An alternative I am exploring is to decompose the user query into set
> of subqueries (SQL and Search) operations to the relevant data sources
> (i.e., context) à transform the results into RDF using local
> Ontologies then resolve differences using the domain ontology à apply
> the SPARQL query on the union of the RDF graphs after reconciliation.
> Even this approach is far better from RDF storage point of view (i.e.,
> scalability), it seems like response time can be less than desirable?
> Comments and thoughts including additional alternatives...
Ezzat,

All I can say without additional detail is that shouldn't jump to
conclusions about the scalability of RDF engines re. the warehousing
approach or the sophistication of SQL optimizers when injected into the
SQL-RDF mapping realm.

Virtuoso offers solutions for the RDF warehousing and RDF Views
approaches. I am certainly happy to be proven wrong via experimentation
re. Virtuoso's ability to handle either approach without compromising
performance or scalability.

Virtuoso has been designed and engineered to handle heavy duty RDF data
management (physical or virtual) from the get go.

Please provide me with additional details about database counts and
sizes etc..

Kingsley

> Regards,
> Ahmed
> /*Ahmed K. Ezzat, Ph.D.*//* */
> *HP Fellow*, *Business Intelligence Software Division
> **Hewlett-Packard Corporation** *
> 19333 Vallco Parkway, MS 4502, Cupertino, CA 95014-2599*
> **Office*: *Email*: _Ahmed.Ezzat@hp.com_ <mailto:Ahmed.Ezzat@hp.com>
> *Tel*: 408-285-6022 *Fax*: 408-285-1430
> *Personal*: *Email*: _AhmedEzzat@aol.com_ <mailto:AhmedEzzat@aol.com>
> *Tel*: 408-253-5062 *Fax*: 408-253-6271
>
> ------------------------------------------------------------------------
>

--

Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software     Web: http://www.openlinksw.com

Received on Thursday, 17 July 2008 04:33:18 UTC