- From: Susie Stephens <susie.stephens@oracle.com>
- Date: Thu, 06 Apr 2006 18:28:50 -0400
- To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevron.com>
- CC: "'public-semweb-lifesci'" <public-semweb-lifesci@w3.org>
I've embedded answers to your questions below. Susie Cutler, Roger (RogerCutler) wrote: >No problem. Getting back to the main subject of the thread, I'm a >little curious whether you've got some Oracle perspective on this issue. >I understand that new Oracle databases are putting RDF into some sort of >triple-store, but I don't know much about the details. Some questions >that occur to me, but maybe not exactly the right questions: > >- Does the RDF just go in as-is or is it compressed in some way? If >there is a size factor of something like 15 from the data itself, are >these RDF stores tending to be real bulky? > > RDF data is compressed - repeated node and link values are stored only once, and when a value repeats in the data only a reference to the already stored value is stored. There is no factor in Oracle RDF that adds to the size of the data. RDF is stored in the Oracle Database in an object-relational implementation, allowing users to manipulate RDF triples as objects. The RDF Data Model can take advantage of the scalability and performance features in the database, e.g. indexing, parallelization, memory management, Real Application Clusters (RAC), etc. It can also work with our image and text management capability, and the security features. As some parsing is needed when the data is initially loaded, there might be slower performance on loading compared to some other systems. However, in return for that, we have fast query performance. >- Is there some sort of indexing and related join-like function? If so, >what are the performance characteristics? > > There are several indexes built on the internal storage structures. We do perform joins but these are highly optimized. Our performance figures show how our design has resulted in very good performance. We have also extended SQL to enable SPARQL-like query capabilities, so the user does not have to be aware that data is held in different tables internally. >As I said, I don't have any experience with the RDF stuff, but some >thoughts based on my experience with relational databases: > >- Just because you've got your data in an Oracle (or any other) database >doesn't mean you are going to be able to get at it in a performant >manner. The devil is in the details. > >- Operations that initiate a full read of a Gigabyte database are >extremely painful. > >- Big joins can also be extremely painful. Would traversing a big bunch >of RDF look something like an incredibly complex hairball of complex >joins? If so, is there a potential problem here? > > Yes, certainly the devil is in the details. And big joins are indeed painful. However the user does not have to do these big joins, nor worry about the details. The RDF query function provided by Oracle gives the user a simple SQL interface to query the internal tables. The internal operations are highly optimized, and where necessary internal Oracle features have been enhanced. Some of these techniques are described in the VLDB paper by Chong et al at http://www.oracle.com/technology/tech/semantic_technologies/pdf/vldb_2005.pdf > >
Received on Thursday, 6 April 2006 22:29:03 UTC