Archive.org: Only 12 days to reach the goal of $150,000

_kkcr · on Dec 19, 2012

The Wayback Machine has served me well over the years. I sent half a bitcoin. The title should include something about the 3:1 matching, this usually makes me much more likely to donate.

I hope they reach their goal and show pictures of what 4PB of storage looks like.

asmosoinio · on Dec 20, 2012

Agree on the title change. "3:1" pushed me over the edge to seek out my PayPal password and make a donation.

carleverett · on Dec 20, 2012

In the donation options, the 3:1 effect of the donation towards the goal is included, but I think it is misleading. Correct me if I'm wrong, but a $50 donation will bring them $200 closer to the $600,000 goal, but will only bring them $50 closer to the $150,000 goal that they are pushing for. That might cause some confusion.

Wingman4l7 · on Dec 20, 2012

Tomato, tomahto. They're really pushing for the $600k goal because that's how much the 4 PB of storage that they want to buy costs. It doesn't really matter because in the end, the percentage is the same.

nnnnnn · on Dec 20, 2012

Someone went wild with that bar graph bevel.

benesch · on Dec 20, 2012

Looks like a default Excel 2007 style to me :/

atesti · on Dec 20, 2012

I'm from Germany. Can I tax deduct this? How would I do it? Just show my bank statement to the German IRS?

cstrat · on Dec 20, 2012

Done - donated $50 to the cause !! =)

nextstep · on Dec 20, 2012

Only 12 more days until the arbitrary date by when Archive.org wanted to raise $150,000.

Wingman4l7 · on Dec 20, 2012

Not an arbitrary date -- the supporter will stop matching the donations after December 31st. The goal could be argued to be arbitrary -- $600k for 4 more petabytes of storage.

bananashake · on Dec 20, 2012

Every person that has ever been harassed by or lost a job over a web page can thank the archive for making that permanent.

It's irresponsible and unlawful to make unauthorized archives of web pages.

cowsaysoink · on Dec 20, 2012

It is ridiculous to me that people view public web pages as something that shouldn't be archived, if anything it provides illuminating snapshots to the state of the web at certain dates.

The archive.org team does follow robots.txt and I believe they remove content retroactively meaning if you update your site with a robots.txt it will delete the old content (which I think sucks).

JoshTriplett · on Dec 20, 2012

> The archive.org team does follow robots.txt and I believe they remove content retroactively meaning if you update your site with a robots.txt it will delete the old content (which I think sucks).

Indeed, especially since most domain parking garbage sites seem to have robots.txt files for some crazy reason.

alexkus · on Dec 20, 2012

> Indeed, especially since most domain parking garbage sites seem to have robots.txt files for some crazy reason.

Presumably to avoid being plagued (in terms of load and bandwidth costs) by the numerous crawling bots looking to update their caches of pages that no longer exist on those domains.

aw3c2 · on Dec 20, 2012

serving 404s is super cheap actually.

alexkus · on Dec 20, 2012

It depends on the setup.

I've seen a CMS brought almost to its knees because the previous owner of that IP had a site that had lots of distinct pages on it. Since every page in the CMS was stored in a DB it took a DB lookup to find out whether the incoming URL existed or not. Caching/varnish wouldn't help as there were hundreds of thousands of different incoming URLs and none will be in the cache because they don't exist.

About 20% of the hits to one site I look after are 404 because they're from the previous site hosted on that IP address. Luckily the vast majority of URLs have a specific prefix so it's a simple rule in the apache config to 404 them without having to got to disk to check for the existence of any files. It still counts against my bandwidth utilisation too (both incoming request and outgoing 404).

drcube · on Dec 20, 2012

>The archive.org team does follow robots.txt and I believe they remove content retroactively meaning if you update your site with a robots.txt it will delete the old content (which I think sucks).

Every time the "Change Facebook back to the way it was!" brigade came out, I would link to the wayback machine's copy of facebook.com from 2005 and say "Is this what you want??". Now I can't do that anymore because of stupid robots.txt.

webreac · on Dec 21, 2012

I hope they have backup of this old content. This robot.txt policy is crap. robot.txt should not be taken into account retroactively when the site owner has changed.

alxndr · on Dec 20, 2012

archive.org does a lot more than make archives of web pages. I really like their music collection.

greatquux · on Dec 20, 2012

Me too. I must use it at least 2 or 3 times a month to stream or download live shows.

icebraining · on Dec 20, 2012

Yeah. And what about all those libraries saving old newspapers? It's disgusting, I tell you!

greatquux · on Dec 20, 2012

You've got to be trolling. Either that or you intended this comment to make people like me actually donate some money! :)