W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > November 2006

Re: Berkeley DB is a non-relational high-performance system/paradigm - anyone looked at it?

From: Keith Bostic <keith.bostic@oracle.com>
Date: Thu, 2 Nov 2006 09:17:08 -0500
Message-Id: <42A2F217-7112-447B-AE0D-CCFBC687FB77@oracle.com>
Cc: public-semweb-lifesci@w3.org
To: dturi@cs.manchester.ac.uk

On Nov 2, 2006, at 5:48 AM, Daniele Turi wrote:

>> To be absolutely clear -- the problems with Subversion were NOT  
>> problems or bugs in Berkeley DB, they were the result of  
>> incompatible interfaces between two software components.
>> I don't want to turn this into a marketing presentation, but given  
>> how this conversation started, I think it's fair for me to give  
>> you a couple of examples: Berkeley DB is the database engine  
>> behind Sun Microsystems LDAP directory server, Google' s  
>> replicated Single Sign On service, Openwave's Email Mx product and  
>> the Amazon web site.
>> Yes, that's right: when you log into Amazon, that customized page  
>> you see is built by roughly 1,000 accesses to Berkeley DB  
>> databases. And when you log into Google's gmail, your account  
>> information is stored in Berkeley DB.
>> And, I can promise you two things: first, that every one of those  
>> products has a lot more than 1 thread or process accessing data at  
>> a time, and second, that every one of these companies wouldn't be  
>> using my technology if there was better or more reliable  
>> technology available!
> I am surprised by the fact that Google uses BDB. In the following  
> recent article
> http://lwn.net/Articles/194667/
> Google's Greg Stein says that they use their own system, called  
> Bigtable:

I guess I wasn't absolutely clear, after all! :-)

Google doesn't use Berkeley DB behind Subversion, they use it as the  
transactional, highly available data server behind their Single Sign  
On service.

There's a paper on Google's use in the upcoming World's 2006  
conference (Worlds is the USENIX Workshop on Real, Large Distributed  
Systems).  The paper is entitled:

	Data Management for Internet-Scale Single-Sign-On
	Sharon E. Perl, Google Inc.; Margo Seltzer, Harvard University and  
Oracle Corporation

and I'm sure it will be available on-line, shortly.

I had never heard of Google using Subversion with their own back-end  
engine before, so I can't say why they made that decision or what  
scaling issues they found when using Berkeley DB behind Subversion.    
The obvious guess would be that Subversion doesn't use Berkeley DB's  
replication support, and so Subversion installations are limited to a  
single machine -- it may have been their choice to write their own  
Subversion repository that distributed their data instead of changing  
Subversion itself to use Berkeley DB's replication?


Keith Bostic
keithbosticim (ymsgid)
Received on Thursday, 2 November 2006 14:20:06 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:52:28 UTC