I like the Unix DBM file format (a.k.a BerkeleyDB). I use it for static data (like the zip code-to-latitude/longitude database for the Hebcal Interactive Jewish Calendar) and for dynamic data (such as the subscriber database for the Mountain View High School Alumni Internet Directory).
BerkeleyDB is also great because it has many language interfaces. I can access the same DB files in both Perl and PHP.
My high school alumni directory subscriber database has experienced corruption a few times recently. It’s a good thing I also keep a daily text backup of the database in RCS because it makes it easy to rebuild the DB.
But it’s obvious to me that the underlying cause of the problem is concurrent access that isn’t protected by mutual exclusion. Heck, I wrote the code back in 1995 when I didn’t know better.
So I’ve gotta go add some locking code to the 25 scripts that manage the site.
However, older versions of BerkeleyDB (such as the one installed on my ISP) don’t natively support locking, so I’ve gotta use flock
for concurrency. No problem; it’s relatively easy to turn every occurance of this:
use DB_File; my(%DB); tie(%DB, 'DB_File', $file, O_RDWR|O_CREAT, 0644, $DB_HASH); $DB{'foo'} = 'bar'; untie(%DB);
into something that looks like this:
use DB_File; use Fcntl qw(:DEFAULT :flock); my(%DB); my($db) = tie(%DB, 'DB_File', $file, O_RDWR|O_CREAT, 0644, $DB_HASH); defined($db) || die "Can't tie $file: $!\n"; my($fd) = $db->fd; open(DB_FH, "+<&=$fd") || die "dup $!"; unless (flock (DB_FH, LOCK_EX)) { die "flock: $!" } $DB{'foo'} = 'bar'; flock(DB_FH, LOCK_UN); undef $db; undef $fd; untie(%DB); close(DB_FH);
Bingo. Problem seems to be fixed. No more DB corruption.
But then, a few weeks later, I get DB corruption again. Ugh. Turns out that I managed to fix 24 of the scripts, but there’s one that I occasionally run by hand (the one that removes someone from the directory) that I forgot to add locking code to. With flock, it only takes one script to screw it up.
So last night I was about to go through the scripts and update them, but reading the DB_File
manpage, they point out a possible problem with the classic “tie the db, dup the fd, then flock” approach. So fixing the 25th script to use the same locking scheme won’t necessarily solve the problem either. Doh!
Reading a little further down the manpage, I see a reference to a simple CPAN module called DB_File::Lock that transparently does flocking when you tie
and untie
the DB. It’s perfect for what I need.
Now I can simply do a search-and-replace throughout the entire codebase and change all DB_File
references to DB_File::Lock
, and get rid of whatever dup/flock stuff I used to use.
use DB_File::Lock; my(%DB); tie(%DB, 'DB_File::Lock', $file, O_RDWR|O_CREAT, 0644, $DB_HASH, 'write'); $DB{'foo'} = 'bar'; untie(%DB);
I’ve also considered moving the code from DBM files into MySQL. My ISP started offering limited MySQL access for an additional buck a month, and relational DBs tend to solve the concurrent access problem in a much more elegant (and consistent) way.
Unfotunately, it would be too much work. I don’t want to rewrite all of my 8-year-old Perl code that serializes an alumni record (just a bunch of key=value pairs) into a delimited string. And the DB access parts of the code aren’t very well abstracted, so switching from a simple hash DB format to a more structured multi-column format is going to be trickier than it seems.
Someday when I find the time to do a complete rewrite I’ll use MySQL as the backing store. And I’ll use that opportunity to get rid of all of my perl4-isms and replace them with appropriate perl5 constructs. Heck, if I delay long enough, perhaps I can go straight from perl4 to perl6! 🙂
For now, DB_File::Lock
is good enough.
You could always use something like Tie::DBI, if you desperately wanted to get it to use a SQL database…
http://search.cpan.org/author/LDS/Tie-DBI-0.86/lib/Tie/DBI.pm
-Dom