Re: Follow up on our conference call on 7/11...

The main problem here, as Jenny said, is the updates. If the dataset doesn't
update, then it can be extracted into RDF, but if there are a lot of
updates, it is better just to query directly the database. Therefore a
proposed solution is to have a SPARQL to SQL translation. My 2 cents.

Juan Sequeda, Ph.D Student

Research Assistant
Dept. of Computer Sciences
The University of Texas at Austin
http://www.cs.utexas.edu/~jsequeda
jsequeda@cs.utexas.edu

Semantic Web in Austin: http://juansequeda.blogspot.com/

On Thu, Jul 17, 2008 at 9:54 AM, Jenny Green <
Jenny.Green@ordnancesurvey.co.uk> wrote:

>  Ahmed,
>
> What you are talking about is creating 'virtual RDF' on top of a database,
> which to my knowledge is what alot of the presenters to the RDB2RDF group,
> including Virtuoso, have been describing. I think the difference in your
> proposed technique, if I understand it correctly, is that you are analysing
> the query to obtain the correct sources within the database rather than
> decomposing the query into its SQL equivalent. I would be very interested to
> see how this technique performed as I can see its application to OWL being
> very useful. The only reservation that I would have is whether the query can
> be decomposed into subqueries that would not return large amounts of
> unnecessary data.
>
> With regard to converting huge amounts of data into RDF, I think that size
> is not the only prohibitive factor. As you say changes in the data need to
> be reflected in the store, our databases get 5000 changes every day, even if
> the amount of data held was small enough to fit in a triple store then there
> is not the infrastructure available to keep the data up to date each day.
> However some of our products, by their nature, do not change very frequently
> and are small enough to be held as RDF; http://os.rkbexplorer.com/
> *Jennifer Green*
> *Research Scientist*
> *Research*, *Ordnance Survey*
> C530, Romsey Road, SOUTHAMPTON, United Kingdom, SO16 4GU
> Phone: +44 (0) 23 8030 5717
> www.ordnancesurvey.co.uk | jenny.green@ordnancesurvey.co.uk<jenny.green@ordnancesurvey.co.uk>
> *
>
> P Please consider your environmental responsibility before printing this
> email.
> *
>
>
> -----Original Message-----
> From: public-xg-rdb2rdf-request@w3.org [
> mailto:public-xg-rdb2rdf-request@w3.org <public-xg-rdb2rdf-request@w3.org>
> ] On Behalf Of Ezzat, Ahmed
> Sent: 17 July 2008 07:44
> To: Kingsley Idehen
> Cc: public-xg-rdb2rdf@w3.org
> Subject: RE: Follow up on our conference call on 7/11...
>
>
>
> Do you have papers describing your Virtuoso Cluster Edition RDF views of
> SQL data and your use of SQL optimization heuristics to deliver
> high-performance and scalable RDF Views of SQL Data.
>
> Not clear what do you mean by creating RDF views of the SQL data?  It
> sounds like you are materializing RDF in the SQL engine? Are you creating
> views to the result of the SQL query execution?  If you do, then it looks
> like a flavor of my 2nd proposition.  I agree materializing the whole
> warehouse is not the first choice.  If not, then I would like to read more
> about your approach....
>
> In your environment, do you support multiple data sources and do you go
> through the local/domain ontologies reconciliation and apply SPARQL?  If you
> do all of that how is the performance from response time point of view? Do
> you have experiments that the user query is executed against multiple data
> sources and have response time numbers?
>
> It would be great if you can send some papers regarding your approach and
> some paper about performance and I can follow up with you after reading the
> material - thanks in advance...
> Regards,
>
> Ahmed
>
>
>
> -----Original Message-----
> From: Kingsley Idehen [
> mailto:kidehen@openlinksw.com <kidehen@openlinksw.com>]
>
> Sent: Wednesday, July 16, 2008 9:50 PM
> To: Ezzat, Ahmed
> Cc: public-xg-rdb2rdf@w3.org
> Subject: Re: Follow up on our conference call on 7/11...
>
> Ezzat, Ahmed wrote:
> > I am not up to speed to what Virtuoso do, i.e., I do not know if what
> Virtuoso do will work in my scenario.
> >
> > But a data warehouse in our environment is 100+ TB which would be
> considered one data source in the enterprise. Do you see converting that
> size of data into RDF (i.e., as described in my first approach) as viable?
> >
> It can be converted, this is a data center matter if warehousing is the
> ultimate solution. But, I wouldn't take the warehousing route if I can
> create RDF Views of the SQL Data :-)  Our RDB to RDF mapping is all about
> using SQL optimization heuristics to deliver high-performance and scalable
> RDF Views of SQL Data.
>
> I am confident with an appropriately configured data center plus Virtuoso
> Cluster Edition using RDF Views or RDF warehousing your challenge is
> addressable. In our tests with the TPC-H benchmark, we've been able to get
> RDF Views to outperform RDF warehousing, so warehousing is purely a last
> resort option at best.
>
> Kinglsey
> > Ahmed
> >
> > -----Original Message-----
> > From: Kingsley Idehen [
> mailto:kidehen@openlinksw.com <kidehen@openlinksw.com>]
>
> > Sent: Wednesday, July 16, 2008 7:16 PM
> > To: Ezzat, Ahmed
> > Cc: public-xg-rdb2rdf@w3.org
> > Subject: Re: Follow up on our conference call on 7/11...
> >
> > Ezzat, Ahmed wrote:
> >
> >> Hello,
> >> This is a question that I would be interested in hearing your
> >> reaction and views about.
> >> In a multiple data sources environment where some of them are huge
> >> like data warehouses, it seems like transforming all data sources
> >> into RDF then querying that RDF store using SPARQL is going to put
> >> too much pressure on the RDF store beyond reasonable. In addition all
> >> changes in these data sources need to be reflected in the RDF store
> >> as soon as possible. In the above paragraph I am ignoring the notion
> >> of local and domain Ontologies.
> >> An alternative I am exploring is to decompose the user query into set
> >> of subqueries (SQL and Search) operations to the relevant data
> >> sources (i.e., context) à transform the results into RDF using local
> >> Ontologies then resolve differences using the domain ontology à apply
> >> the SPARQL query on the union of the RDF graphs after reconciliation.
> >> Even this approach is far better from RDF storage point of view
> >> (i.e., scalability), it seems like response time can be less than
> desirable?
> >> Comments and thoughts including additional alternatives...
> >>
> > Ezzat,
> >
> > All I can say without additional detail is that shouldn't jump to
> > conclusions about the scalability of RDF engines re. the warehousing
> > approach or the sophistication of SQL optimizers when injected into
> > the SQL-RDF mapping realm.
> >
> > Virtuoso offers solutions for the RDF warehousing and RDF Views
> > approaches. I am certainly happy to be proven wrong via
> > experimentation re. Virtuoso's ability to handle either approach
> > without compromising performance or scalability.
> >
> > Virtuoso has been designed and engineered to handle heavy duty RDF
> > data management (physical or virtual) from the get go.
> >
> > Please provide me with additional details about database counts and
> > sizes etc..
> >
> >
> > Kingsley
> >
> >
> >> Regards,
> >> Ahmed
> >> /*Ahmed K. Ezzat, Ph.D.*//* */
> >> *HP Fellow*, *Business Intelligence Software Division
> >> **Hewlett-Packard Corporation** *
> >> 19333 Vallco Parkway, MS 4502, Cupertino, CA 95014-2599*
> >> **Office*: *Email*: _Ahmed.Ezzat@hp.com_ <
> mailto:Ahmed.Ezzat@hp.com <Ahmed.Ezzat@hp.com>>
>
> >> *Tel*: 408-285-6022 *Fax*: 408-285-1430
> >> *Personal*: *Email*: _AhmedEzzat@aol.com_ <
> mailto:AhmedEzzat@aol.com <AhmedEzzat@aol.com>>
>
> >> *Tel*: 408-253-5062 *Fax*: 408-253-6271
> >>
> >> ---------------------------------------------------------------------
> >> ---
> >>
> >>
> >
> >
> > --
> >
> >
> > Regards,
> >
> > Kingsley Idehen       Weblog:
> http://www.openlinksw.com/blog/~kidehen<http://www.openlinksw.com/blog/%7Ekidehen>
>
> > President & CEO
> > OpenLink Software     Web: http://www.openlinksw.com
> >
> >
> >
> >
> >
> >
>
>
> --
>
>
> Regards,
>
> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen<http://www.openlinksw.com/blog/%7Ekidehen>
> President & CEO
> OpenLink Software     Web: http://www.openlinksw.com
>
>
>
>
>
> .
>
>
> This email is only intended for the person to whom it is addressed and may contain confidential information. If you have received this email in error, please notify the sender and delete this email which must not be copied, distributed or disclosed to any other person.
>
> Unless stated otherwise, the contents of this email are personal to the writer and do not represent the official view of Ordnance Survey. Nor can any contract be formed on Ordnance Survey's behalf via email. We reserve the right to monitor emails and attachments without prior notice.
>
> Thank you for your cooperation.
>
> Ordnance Survey
> Romsey Road
> Southampton SO16 4GU
> Tel: 08456 050505http://www.ordnancesurvey.co.uk
>
>

Received on Thursday, 17 July 2008 08:36:45 UTC