Friday, February 16, 2007


I happen to work in the same department at IBM as Joe Gregorio. On the whole that little factoid is not likely to get you very far on Jeopardy, but it does happen to be the starting point for this post. A few weeks ago, I was talking to our common manager and asked him if Joe had made some special deal with the IBM lawyers since he seemed to be allowed to write open source software with impunity while I distinctly remember signing an "IBM owns everything you ever think of" contract on my first day of work 17 years ago (holy crap, have I been here that long?).

Mr. Manager did a little research on the official guidelines and confirmed that, in fact, IBMers are allowed to write/contribute open source software on their own time as long as said software does not compete with anything IBM sells or advantage an IBM competitor (and before anyone whines, I looked it up and you can verbify "advantage"). Cool!

I had been working on a simple little backup utility program at home and decided it was time to start building my open source portfolio. Hence D8a.

The original impetus was pretty simple. We've had a digital camera for a few years now and have managed to amass about 8GB of pictures. While many are just unflattering dross, there are a significant number that my wife and I would like to keep around. Being a battle-hardened survivor of at least my fair share of hard drive crashes, I was wise enough not to trust our precious bits to a single set of spinning magnetic platters and keep the data replicated across at least 2 physical drives at home. Then my wife pointed out that we'd still end up pictureless if our house was ever burglarized. Or flooded. Or burned. Or hit by an asteroid. Or squashed by a derelict satellite. Or picked as a landing site by an alien invasion fleet. She had a point (except for the alien invasion part -- that's just silly).

Her solution was to burn the data to CD's or DVD's and store them at strategic offsite locations (nee "relatives' houses"). But I had a hosting provider. And I pay them for 100GB of space. And I am using approximately 0.5% of that space. And I decided I could justify the $80 a year I spend on hosting and fix my picture backup problem in one glorious endeavor. And burning CD's is so tedious.

My initial thoughts were just to use curlftpfs (FTP is the only way to access my hosted content) and the same rsync that I use to sync the pictures across filesystems at home. Didn't work -- curlftpfs (and from what I can gather, FTP in general) has no provisions for setting the timestamps on files which means that rsync would have to transfer my entire picture library to check for diffs every time it synced. Bummer. But what if I could store some metadata on the FTP site? Maybe store the files' checksums and logical timestamps so they could just be read rather than computed? Sounds like I need a tool! I like writing tools!

Writing the logic to mirror a set of data is pretty trivial. I've done it at least half a dozen times in the past. The tedious part is creating a uniform way to access data that is stored in local files and an FTP site. Enter D8a.

D8a is a Ruby library that provides a simplistic, uniform abstraction of data that lives (initially) in the filesystem or on an FTP site. A "D8a" is a collection of named pieces of data, each of which consists of a sequence of bytes plus some metadata. The naming is hierarchal and the mapping onto both the filesystem and FTP space is obvious. The metadata consists of things like size, modification time, checksum, etc. (the exact set depends on where the data is stored). The initial code is already committed & version 0.1 should (hopefully) be available in the next couple weeks. It won't do much, but my current schedule has D8a doing nightly backups of our pictures by the end of March.

Come check out the D8a project on RubyForge if it sounds interesting to you. I've already got several ideas for future expansion (mapping onto HTTP, zip/tar files, Amazon S3, databases, etc.; better integration with Ruby's IO class; addition of orthogonal behaviors (already started this one)), but I'm always looking for new ideas and collaborators!


At 9:05 PM , Blogger Peaches said...

Any progress on this front? I want to essentially do the exact same thing.

I'm paying for a GoDaddy hosting service that offers tons of storage space for next to nothing. Right now all I use it for is a place to host ASP.NET apps since my primary VPS host is a Linux machine. Of course the only way to get at the GoDaddy space is FTP.

What I was hoping to do was mount the FTP share on my local machine and my remote Virtual Private Server to backup both my local media and my remote media to that GoDaddy space. Otherwise it's just going to sit there virtually empty.

I have hardly any Ruby experience, but I have been wanting a project to get my jump-started on Ruby. Need any help?

At 10:55 PM , Blogger Mike Burr said...

Sorry for the delay in answering, it's been a busy day.

GoDaddy is actually the ISP I mentioned in the original post. I'm a bit surprised anybody found either this blog entry or the project on RubyForge. If you don't mind me asking, what led you here?

I'd appreciate any help you'd care to offer! The file logic is essentially done and much of the FTP logic is working, but there are still a few more pieces left to implement before the synchronization will be fully functional. Are there any areas you are more or less comfortable with?

At 5:44 PM , Blogger Peaches said...

Hi Mike,

I should have checked back here sooner. This would be a LOT easier over e-mail, lol.

Can you kick me an e-mail to:

(naschbac) (at) (gonzaga) (dot) (edu)

without the parens and super-sophisticated cryptography? :)

I have some ideas, like potentially merging support for this with the official CurlFTPFS so that it's just transparent based on a mode switch argument for the FUSE plug-in.

At 6:54 AM , Anonymous Anonymous said...

This comment has been removed by a blog administrator.


Post a Comment

Subscribe to Post Comments [Atom]

<< Home