Re: [BioRDF] Scalability from Susie Stephens on 2006-04-06 (public-semweb-lifesci@w3.org from April 2006)

From: Susie Stephens <susie.stephens@oracle.com>
Date: Thu, 06 Apr 2006 18:28:50 -0400
To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevron.com>
CC: "'public-semweb-lifesci'" <public-semweb-lifesci@w3.org>
Message-ID: <44359622.3060105@oracle.com>

I've embedded answers to your questions below.

Susie


Cutler, Roger (RogerCutler) wrote:

>No problem.  Getting back to the main subject of the thread, I'm a
>little curious whether you've got some Oracle perspective on this issue.
>I understand that new Oracle databases are putting RDF into some sort of
>triple-store, but I don't know much about the details.  Some questions
>that occur to me, but maybe not exactly the right questions:
>
>- Does the RDF just go in as-is or is it compressed in some way?  If
>there is a size factor of something like 15 from the data itself, are
>these RDF stores tending to be real bulky?
>  
>
RDF data is compressed - repeated node and link values are stored only 
once, and when a value repeats in the data only a reference to the 
already stored value is stored. There is no factor in Oracle RDF that 
adds to the size of the data. RDF is stored in the Oracle Database in an 
object-relational implementation, allowing users to manipulate RDF 
triples as objects.

The RDF Data Model can take advantage of the scalability and performance 
features in the database, e.g. indexing, parallelization, memory 
management, Real Application Clusters (RAC), etc. It can also work with 
our image and text management capability, and the security features.

As some parsing is needed when the data is initially loaded, there might 
be slower performance on loading compared to some other systems. 
However, in return for that, we have fast query performance.

>- Is there some sort of indexing and related join-like function?  If so,
>what are the performance characteristics?
>  
>
There are several indexes built on the internal storage structures. We 
do perform joins but these are highly optimized. Our performance figures 
show how our design has resulted in very good performance. We have also 
extended SQL to enable SPARQL-like query capabilities, so the user does 
not have to be aware that data is held in different tables internally.

>As I said, I don't have any experience with the RDF stuff, but some
>thoughts based on my experience with relational databases:
>
>- Just because you've got your data in an Oracle (or any other) database
>doesn't mean you are going to be able to get at it in a performant
>manner.  The devil is in the details.
>
>- Operations that initiate a full read of a Gigabyte database are
>extremely painful.
>
>- Big joins can also be extremely painful.  Would traversing a big bunch
>of RDF look something like an incredibly complex hairball of complex
>joins?  If so, is there a potential problem here?
>  
>
Yes, certainly the devil is in the details. And big joins are indeed 
painful. However the user does not have to do these big joins, nor worry 
about the details. The RDF query function provided by Oracle gives the 
user a simple SQL interface to query the internal tables. The internal 
operations are highly optimized, and where necessary internal Oracle 
features have been enhanced. Some of these techniques are described in 
the VLDB paper by Chong et al at 
http://www.oracle.com/technology/tech/semantic_technologies/pdf/vldb_2005.pdf 


>  
>

Received on Thursday, 6 April 2006 22:29:03 UTC