Mike West — Web Application Developer

Ben Ward has a great post up on YDN discussing the massive addition of microformats to the latest refresh of the Kelkoo. ~27 million hListings can’t be wrong! It’s a great read, and a real validation of the concept.

Moreover, it’s an excellent introduction to microformats as API. There’s no official Kelkoo API to grab and process listings or reviews or companies, but the judicious use of microformats makes it easy for you to grab and parse the information out of the page on your own. The article’s very much worth paging through; I suggest you take a look.

Mike West is a web application developer living in Munich, Germany. Professionally programming for the web since 2000, he's available for contract work now a web developer at Yahoo! Germany. Read Mike's bio, or drop him an e-mail.

One of the supreme pleasures of my job is the simple chance to work with real experts in the field on a daily basis. Mike Davies, for example, knows an incredible amount about building accessible websites, and never hesitates to share his opinions with the rest of us. He’s an innovative developer in many other ways, but this is a particular area of expertise.

Just this week, he’s gotten started a new site, Accessibility Tips, that is absolutely worth sticking in your RSS reader. It looks like it’s going to be a brilliant resource, codifying best practices for building clean and accessible websites in an easy to understand, and well justified manner.

He’s got 4 articles up so far, and I’m already finding myself filing bugs against the sites I work on to put in some quick fixes. “Providing Link Text” might well have been custom-written as a quick reminder that my baby needs work. :)

Nicely done, Mike.

So, the disk on which I keep the main copy of my Aperture library started making strange clicking noises when plugged into my powerbook. It makes these noises instead of the expected whirring and humming and actual reading of data. This, as you may suspect, is a Bad Thing.

Thankfully, it works perfectly when plugged into my work laptop, so I’ve spent the majority of the day descending into full-blown backup paranoia. I’ve consolidated all my important and cloned them onto three separate hard drives. Now I’m beginning the process of burning a million DVDs. Of course, the aforementioned paranoia forces me to recognize that DVDs degrade; I can’t trust them, you see. What to do?

This is thankfully a solved problem.

Usenet posters have used something called Parchive for years now to post binary files with some guarantee of completeness in the intrinsically lossy world of globally mirrored newsgroups. Along with the actual data that’s written to a newsgroup, the poster will upload a number of PAR files containing parity information that allows you to regenerate any lost data. Without going into the details of Reed-Solomon error correction, this means that if a few pieces of the data you’re downloading are missing, you can generate them yourself, ensuring that the original signal gets through.

The same theory applies to DVDs. I don’t particularly trust the medium to guarantee successful backups over time, but I do trust that they’ll probably retain 99% of the bits I care about. Parchive, therefore, looks like a great solution.

Installing parchive

Installing Parchive is trivial. You can grab binaries off sourceforge, or check out the CVS tree and compile it yourself, like so:

cvs -d:pserver:anonymous@parchive.cvs.sourceforge.net:/cvsroot/parchive login
cvs -z3 -d:pserver:anonymous@parchive.cvs.sourceforge.net:/cvsroot/parchive co -P par2cmdline
cd par2-cmdline/
./configure --prefix=/usr/local
make
sudo make install

Using parchive

The main thing I want to do with Parchive is create PAR files that contain vital parity information about my data. when doing so, there are two options that are important to consider: % redundancy, and block size.

The former option controls how much parity information is generated, that is, how much data you can lose while still being able to regenerate the whole. I’ve settled on 10% as being stupendously beyond the amount of bad sectors I’d expect to see on a DVD within a reasonable amount of time, and therefore “safe”.

The latter option only makes sense if you know a little about how parchive works: in a nutshell, it breaks your files up into smaller pieces, and generates parity information for each block separately. If you lose one bit of a block, the whole thing is invalid, and has to be regenerated. In an ideal world, then, you’d set the block size equal to the sector size of whatever medium you’re using for backup. In that case, you’ve got the best protection against a single sector dying; you don’t waste any space. This efficiency, however, is impractical for two reasons: first, the smaller the block size, the longer it takes to process the data, and second, parchive’s algorithm is limited to a maximum of 32,768 (coincidentally, that’s 215) blocks. If you set a 2k block size to maximize efficiency, you’d only be able to process ~65M before parchive fell over and died. I need to write parity information for up to a whole DVD’s worth of data, ~4Gb (~4.7Gb total capacity - 10% redundancy). 4 millionish kilobytes / 32768 blocks = about 123kb per block. I’ll double that (and round to the nearest power of two, because I suspect that makes things easier internally), and end up with a block size of 262,144 bytes.

So, creating PAR files for a directory is just a matter of plugging these values into the par2create command:

par2create -s262144 -r10 [NameOfParFile.par2] [FilesToRead]

Easy, eh?

Verifying the files in a directory is equally trivial, using par2verify:

par2verify [NameOfParFile.par2]

If par2verify tells you that you’ve got corruption in your data, you can repair it with par2repair:

par2repair [NameOfParFile.par2]

When that’s finished, parchive will have magically regenerated your data from the parity files you have at hand. Brilliant.

Carlo just launched his latest project, Escaloop. It’s a nicely structured way of creating a lifestream that pulls in your content from external sites (flickr, last.fm, etc), binding them together into a nicely presented, embedable “badge.”

He’s done a good job with it, and it couldn’t be easier to try out for yourself… You don’t even have to create a user account to get going (in fact, you can’t create a user account: they don’t exist!). It’s lightweight, and easy to play around with. Go try it out, you’ll like it!


Recent Articles

  • Briefly, my thoughts on the current dustup over the W3C's CSS Working Group.
  • In a spontaneous burst of productivity, spawned mostly by my complete and utter failure as a sysadmin, I moved my parent's email account off my server. DNS Made Easy made this a trivial task.
  • I started having strange text wrapping problems after implementing implementing the beautifully colored bash prompt I discussed on Monday. After fidgeting around a bit, I think I've come up with a solution.
  • My jealousy of Adriano's pretty `bash` prompt has been assuaged by the construction of my own, _prettier_ and _more functional_ prompt. So there!
  • When Murray and Norm solicited talks earlier in the year for the Yahoo! Frontend Summit, they somehow neglected to mention that the presentations would end up being hour-long blocks. :)
  • I still haven't written anything useful about the @media Ajax conference, but here are some lovely pictures. Should be worth about 64,000 words, right?