Friday, February 16, 2007

D8a

I happen to work in the same department at IBM as Joe Gregorio. On the whole that little factoid is not likely to get you very far on Jeopardy, but it does happen to be the starting point for this post. A few weeks ago, I was talking to our common manager and asked him if Joe had made some special deal with the IBM lawyers since he seemed to be allowed to write open source software with impunity while I distinctly remember signing an "IBM owns everything you ever think of" contract on my first day of work 17 years ago (holy crap, have I been here that long?).

Mr. Manager did a little research on the official guidelines and confirmed that, in fact, IBMers are allowed to write/contribute open source software on their own time as long as said software does not compete with anything IBM sells or advantage an IBM competitor (and before anyone whines, I looked it up and you can verbify "advantage"). Cool!

I had been working on a simple little backup utility program at home and decided it was time to start building my open source portfolio. Hence D8a.

The original impetus was pretty simple. We've had a digital camera for a few years now and have managed to amass about 8GB of pictures. While many are just unflattering dross, there are a significant number that my wife and I would like to keep around. Being a battle-hardened survivor of at least my fair share of hard drive crashes, I was wise enough not to trust our precious bits to a single set of spinning magnetic platters and keep the data replicated across at least 2 physical drives at home. Then my wife pointed out that we'd still end up pictureless if our house was ever burglarized. Or flooded. Or burned. Or hit by an asteroid. Or squashed by a derelict satellite. Or picked as a landing site by an alien invasion fleet. She had a point (except for the alien invasion part -- that's just silly).

Her solution was to burn the data to CD's or DVD's and store them at strategic offsite locations (nee "relatives' houses"). But I had a hosting provider. And I pay them for 100GB of space. And I am using approximately 0.5% of that space. And I decided I could justify the $80 a year I spend on hosting and fix my picture backup problem in one glorious endeavor. And burning CD's is so tedious.

My initial thoughts were just to use curlftpfs (FTP is the only way to access my hosted content) and the same rsync that I use to sync the pictures across filesystems at home. Didn't work -- curlftpfs (and from what I can gather, FTP in general) has no provisions for setting the timestamps on files which means that rsync would have to transfer my entire picture library to check for diffs every time it synced. Bummer. But what if I could store some metadata on the FTP site? Maybe store the files' checksums and logical timestamps so they could just be read rather than computed? Sounds like I need a tool! I like writing tools!

Writing the logic to mirror a set of data is pretty trivial. I've done it at least half a dozen times in the past. The tedious part is creating a uniform way to access data that is stored in local files and an FTP site. Enter D8a.

D8a is a Ruby library that provides a simplistic, uniform abstraction of data that lives (initially) in the filesystem or on an FTP site. A "D8a" is a collection of named pieces of data, each of which consists of a sequence of bytes plus some metadata. The naming is hierarchal and the mapping onto both the filesystem and FTP space is obvious. The metadata consists of things like size, modification time, checksum, etc. (the exact set depends on where the data is stored). The initial code is already committed & version 0.1 should (hopefully) be available in the next couple weeks. It won't do much, but my current schedule has D8a doing nightly backups of our pictures by the end of March.

Come check out the D8a project on RubyForge if it sounds interesting to you. I've already got several ideas for future expansion (mapping onto HTTP, zip/tar files, Amazon S3, databases, etc.; better integration with Ruby's IO class; addition of orthogonal behaviors (already started this one)), but I'm always looking for new ideas and collaborators!