- From: Cutler, Roger (RogerCutler) <RogerCutler@chevron.com>
- Date: Thu, 6 Apr 2006 12:42:32 -0500
- To: "Susie Stephens" <susie.stephens@oracle.com>, public-semweb-lifesci@w3.org
No problem. Getting back to the main subject of the thread, I'm a little curious whether you've got some Oracle perspective on this issue. I understand that new Oracle databases are putting RDF into some sort of triple-store, but I don't know much about the details. Some questions that occur to me, but maybe not exactly the right questions: - Does the RDF just go in as-is or is it compressed in some way? If there is a size factor of something like 15 from the data itself, are these RDF stores tending to be real bulky? - Is there some sort of indexing and related join-like function? If so, what are the performance characteristics? As I said, I don't have any experience with the RDF stuff, but some thoughts based on my experience with relational databases: - Just because you've got your data in an Oracle (or any other) database doesn't mean you are going to be able to get at it in a performant manner. The devil is in the details. - Operations that initiate a full read of a Gigabyte database are extremely painful. - Big joins can also be extremely painful. Would traversing a big bunch of RDF look something like an incredibly complex hairball of complex joins? If so, is there a potential problem here? -----Original Message----- From: Susie Stephens [mailto:susie.stephens@oracle.com] Sent: Wednesday, April 05, 2006 5:47 PM To: Cutler, Roger (RogerCutler) Subject: Re: [BioRDF] Scalability Roger, We didn't have a BioRDF call this week, as it clashed with Bio-IT World. This was posted on the Wiki (http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup). Cheers, Susie Cutler, Roger (RogerCutler) wrote: >Somewhere down near the bottom of the lengthy thread that started with >a query about ontology editors, someone casually mentioned that 53 Mby >of data that was "imported" -- from which I infer it was not binary, >compressed data but in some sort of text format -- turned into over 800 >Mby of RDF. Frankly, a factor of 15 in size, possibly from a format >that is fairly large to start out with, worries me. There have since >been some comments that sound like people think that they are going to >deal with this by generating RDF only on-the-fly, as needed. It seems >to me, given the networked nature of RDF, that this is likely to have >its own problems. None of the solutions of which I am aware that >actually are in operation work this way, but I will freely admit that >my experience level here is pretty low. > >It seems to me that there are at least three ways that one might try to >cope with this issue: > >1 - Generate the RDF on-the-fly (as I said, I'm personally dubious >about this one). > >2 - Make the RDF smaller somehow (maybe by making the URI's shorter, a >al tinyurl???) > >3 - Limit the amount of information that is actually put into RDF to >some sort of descriptive metadata and keep pointers to the real data, >which is in some other format. > >I think that the third approach is what I have seen done, but I get the >impression that people may not be thinking in this way in this group. > >I've prefaced this [BioRDF] because there has already been some >discussion of scalability in that context and I believe that this issue >has recently been upgraded in the deliverables of this subgroup. > >Incidentally, what happened to the BioRDF telcons on Monday? I was on >vacation for a while and when I came back it didn't seem to be there. > > > >
Received on Thursday, 6 April 2006 17:43:49 UTC