- From: Keith Bostic <keith.bostic@oracle.com>
- Date: Sun, 29 Oct 2006 08:52:57 -0500
- To: public-semweb-lifesci@w3.org
> ... and notorious for its instability and possible corruption whenever > you have more than 1 thread/process/machine accessing the same > database. > (For instance, the BDB based backends for Subversion repositories > quite > often got corrupted) Folks, this is just wrong. There are two issues here: First, when people talk of Berkley DB, they are sometimes referring to the original Berkeley DB release, version 1.85, which was first distributed around 1992. BDB version 1.85 is still in wide use in many, many applications because it's ubiquitous, has a tiny footprint, is easy to find, and works well for what it is: a simple database engine, supporting Hash and Btree access methods. BDB 1.85 doesn't support locking or transactions in any form, and if the application or system fails, data corruption or loss is possible. The current release of Berkeley DB was made a month ago, and is Berkeley DB version 4.5. BDB in 2006 is a fast, scalable, transactional database engine with industrial grade reliability and availability. Second, when people talk of Berkeley DB stability, they are sometimes referring to the widely-known problems seen when BDB is used as the underlying engine for Subversion. To simplify the issues, the problem was the Subversion use of BDB, caused by the fact that Subversion/BDB installations did not run transactional recovery after application or system failure. This was an architectural issue: both BDB and Subversion are libraries, and there was no way in either piece of software to know when transactional recovery was necessary. Since recovery wasn't being run after system or application failure, of course instability and data corruption resulted. In February of this year, Collabnet and Berkeley DB engineers collaborated on a new set of APIs for the BDB library so that transactional recovery would be automatically run after application or system failure, which resolved this problem. For details, you can see the Collabnet press release on the topic: http://www.collab.net/news/press/2006/sleepycat.html To be absolutely clear -- the problems with Subversion were NOT problems or bugs in Berkeley DB, they were the result of incompatible interfaces between two software components. I don't want to turn this into a marketing presentation, but given how this conversation started, I think it's fair for me to give you a couple of examples: Berkeley DB is the database engine behind Sun Microsystems LDAP directory server, Google' s replicated Single Sign On service, Openwave's Email Mx product and the Amazon web site. Yes, that's right: when you log into Amazon, that customized page you see is built by roughly 1,000 accesses to Berkeley DB databases. And when you log into Google's gmail, your account information is stored in Berkeley DB. And, I can promise you two things: first, that every one of those products has a lot more than 1 thread or process accessing data at a time, and second, that every one of these companies wouldn't be using my technology if there was better or more reliable technology available! > But hey, that's to expect when you go for a plain file-store > instead of > something with a server backend. With a > single-thread-process-host-architecture it could work great. Yes, Berkeley DB runs on top of the filesystem, it doesn't require a raw partition on which to run. That said, that's a feature, not a bug! For that reason, BDB doesn't require the server be brought down in order to increase the size of the raw partition, hot backups and archival can be done with the standard system tools, and there are no additional administration requirements. On top of the filesystem, Berkeley DB provides a transactional engine that offers B+tree, Queue and Hash access methods. The transactions are like everybody else's: write-ahead logging, cursors, multi- version concurrency control, fine-granularity locking, multiple degrees of isolation, high-availability and fault tolerance through replication, and so on. > Remember also that this is not magic "paradigm", it is just a disk > based > hash table. This is wrong. Berkeley DB isn't just a Hash table (in fact, it never was, even Berkeley DB 1.85 had a B+tree as well as a Hash table). Berkeley DB offers a B+tree implementation, which is pretty standard. But, Berkeley DB also offers a Queue access method with atomic consume operations, as well as a Extended Linear Hash access method for data sets sufficiently large relative to the cache that Hash will out-perform a B+tree. In summary, the Berkeley DB of 2006 isn't your parent's Berkeley DB. :-) Regards, --keith =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Keith Bostic +1-781-259-3139 keithbosticim (ymsgid) keith.bostic@oracle.com
Received on Monday, 30 October 2006 02:35:08 UTC