Re: Berkeley DB is a non-relational high-performance system/paradigm - anyone looked at it? from William Bug on 2006-09-15 (public-semweb-lifesci@w3.org from September 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Fri, 15 Sep 2006 12:53:36 -0400
To: Stian Soiland <ssoiland@cs.man.ac.uk>
Cc: public-semweb-lifesci@w3.org
Message-Id: <5AFBC16D-E1B0-4F16-846C-4B602833AE6A@DrexelMed.edu>

BDB is quite powerful and venerable, absolutely, especially in a  
single process/app/user environment.  We used it for a high- 
throughput genomic parsing pipeline back at DoubleTwist.

I agree with Stian, however, the instability issues for a highly  
interactive, multi process environment BDB is very problematic - and  
these are the sort of requirements RDBMSes handle well.

On a side note, the most recent Subversion release (1.4) actually  
includes an the use of an automated hook for rebuilding corrupted BDB  
stores.  It was becoming such a pervasive issue for subversion users,  
that the SVN developers at Collabnet worked with SleepyCat directly  
to modify the BDB core and adds these new API calls.  In SVN v1.4,  
when your SVN BDB repository gets corrupted, it will be  
"automagically" repaired without you noticing.  Still, given this  
instability, I'd switched our SVN store to FSFS a year ago.

On the reasoner performance front, is the wonderful work done by the  
Cambridge IBM group on the BOCA Enterprise level RDF store relevant  
(www.nesc.ac.uk/action/esi/download.cfm?index=3139)?  I don't know  
how this would map into the OWL reasoning application space, but it  
looks very powerful, and I understand BOCA and it's off-spring are  
being used by the myGRID project.

I also believe Chemezie done very useful work in this arena (http:// 
copia.ogbuji.net/blog/keyword/data), though, again, he'd be able to  
tell us more specifically how relevant it is to OWL-based applications.

Cheers,
Bill

On Sep 15, 2006, at 11:02 AM, Stian Soiland wrote:

>
> Bob Futrelle wrote:
>
>> With all the discussion of RDBMS+SQL for leveraging reasoning, I was
>> wondering if anyone has looked at alternatives such as the Berkeley
>> DB.  It is a reasonably mature technology; acquired by Oracle earlier
>> this year.
>
> ... and notorious for its instability and possible corruption whenever
> you have more than 1 thread/process/machine accessing the same  
> database.
> (For instance, the BDB based backends for Subversion repositories  
> quite
> often got corrupted)
>
> But hey, that's to expect when you go for a plain file-store  
> instead of
> something with a server backend. With a
> single-thread-process-host-architecture it could work great.
>
> Remember also that this is not magic "paradigm", it is just a disk  
> based
> hash table.
>
> -- 
> Stian Soiland
> School of Computer Science
> The University of Manchester
> http://www.cs.man.ac.uk/~ssoiland/
>
>

Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)

Please Note: I now have a new email - William.Bug@DrexelMed.edu

This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.

Received on Friday, 15 September 2006 16:54:05 UTC