Simon Willison's Weblog: berkeleydb

Keyspace

2009-07-16T10:30:14+00:00

Yet Another Key-Value Store—this one focuses on high availability, with one server in the cluster serving as master (and handling all writes), and the paxos algorithm handling replication and ensuring a new master can be elected should the existing master become unavailable. Clients can chose to make dirty reads against replicated servers or clean reads by talking directly to the master. Underlying storage is BerkeleyDB, and the authors claim 100,000 writes/second. Released under the AGPL.

Tags: agpl, berkeleydb, databases, keyspace, keyvaluepairs, paxos, replication, scaling

MemcacheDB

2009-01-05T12:37:15+00:00

MemcacheDB

A server that speaks the memcache protocol but uses Berkeley DB for reliable persistent storage. Speedy: 20,000 writes/second and 60,000+ reads/second. Includes a full replication mechanism (with custom memcache protocol commands) based on Berkeley DB’s.

Tags: berkeleydb, keyvaluepairs, memcache, memcachedb, replication, scaling

skipdb

2007-02-04T13:09:59+00:00

skipdb

Small, fast BerkeleyDB style database using skip lists, by the creator of the Io programming language.

Via programming.reddit.com

Tags: berkeleydb, io, skipdb, skiplists, steve-dekorte

Discovering Berkeley DB

2003-11-26T02:36:47+00:00

I'm working on a project at the moment which involves exporting a whole bunch of data out of an existing system. The system is written in Perl and uses Berkeley DB files for most of its storage.

I'd never done anything with Berkeley DB before, but luckily Python has a module which seems to do all of the hard work for me:

>>> db = bsddb.btopen('xpand.db')
>>> db.keys()[0:10]
[':archives:index.html', ':art:test.html', ... 
>>> db[':art:test.html']
'template;front.tp\x01\x01'
>>>

The Berkeley DB libraries are maintained by Sleepycat Software. Unfortunately, their site is completely saturated with marketing jargon. Our customers rely on Berkeley DB for fast, scalable, reliable and cost-effective data management for their mission-critical applications. Great - now what does it do exactly?

Some digging around turned up the real information: the Berkeley DB Tutorial and Reference Guide, which contains pretty much everything you could possible want to know about the technology. It turns out that at a basic level Berkeley DB is just a very high performance, reliable way of persisting dictionary style data structures - anything where a piece of data can be stored and looked up using a unique key. The key and the value can each be up to 4 gigabytes in length and can consist of anything that can be crammed in to a string of bytes, so what you do with it is completely up to you. The only operations available are "store this value under this key", "check if this key exists" and "retrieve the value for this key" so conceptually it's pretty simple - the complicated stuff all happens under the hood.

It seems like a great alternative to a full on relational database for simple applications, although I'm slightly confused by the license which allows free use for open source products but requires a license for commercial applications. Does that mean that if I use the bsddb Python module in a commercial app I need to get a license from Sleepycat?

Tags: berkeleydb, perl, python