it's easy to retrieve google caches with a ruby script. here's one i used in the...

pronoiac · on Dec 11, 2009

Warrick works better for that, at least: http://warrick.cs.odu.edu/warrick.html

It sleeps in between queries, so you don't get temporarily banned from Google.

I think it's not currently working for Yahoo or MSN/Bing. Fixing that might be easier than doing everything else manually.

Edit: I've gotten a response from Frank McCown, creator of Warrick, that he's looking into it.

Edit 2: He'll try to update it next week.

tectonic · on Dec 12, 2009

Warrick looks like exactly what he needs.

pvg · on Dec 11, 2009

His biggest problem appears to be the images (and possibly other resources included in the pages). It's pretty much a given he'll be able to recover the text itself.

rayvega · on Dec 12, 2009

The permanent loss of the images makes it a greater tragedy since half the content in any given post of his consists of images.

pvg · on Dec 12, 2009

There are many, many images in the pinboard archive, a couple of hundred posts' worth. I don't know if he also has other sources from which to retrieve them, he doesn't seem to have grabbed them from pinboard yet. But a good chunk of his stuff will be recovered, images and all.

bmm6o · on Dec 12, 2009

He wrote a blog post (maybe more than one) about how he was hosting his images from Amazon S3. Did he not follow through, or did he switch away from that?