Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PSA: Internet Archive "glitch" deletes years of user data and accounts (gingerbeardman.com)
18 points by msephton on Aug 1, 2024 | hide | past | favorite | 13 comments


Not so irrelevant question: Is Internet Archive "sustainable", particularly in the long term? Is it possible to keep a history of the Internet forever? If yes, how?


Feels like it's likely to disappear completely within the next 10 years due to either legislative action or just incompetence. Individually archived items can be downloaded, but the Wayback Machine will be lost forever.


This is terribly unfortunate. If you have an internet archive account, I highly recommend keeping your own copy of your metadata (uploads, favorites, etc), as well as items within your data storage means (each item’s entire bundle can be downloaded as a zip, as well as via torrent file).


Are you sure that you can download metadata? Favorites, lists, posts, reviews, collections, and web archives? I don't think it's possible. Even saving the HTML of the page is tricky because the site renders the content using JavaScript.

Of course you can download your own uploads, but it's likely you have them in your local backups already if you uploaded them?


> Of course you can download your own uploads, but it's likely you have them in your local backups already as you uploaded them?

Sadly many do not keep copies, so it is worth the PSA. IA is not a substitute for backups, at best it is one more backup location. I try to treat it as a way of publishing what would otherwise be stuck in some free upload service waiting to auto-expire.


Right, but the issue in the OP is that it's the data other than the uploads that has been deleted and not being restored.


https://github.com/Ghosty-Tongue/IA-Account-Backup appears to support bulk downloading items associated with a user account. More to come in this regard.


I took a look at this script and you can do the same thing it does with the ia command line tool (`pip install internetarchive` is the easiest way to get it)

ia download --search uploader:youremail@example.com


Interestingly, this was created as a reaction to the data loss. A small silver lining, I suppose.


I found that it's possible to download an account's Web Archives list as a json file, for example: https://archive.org/download/@brewster/@brewster_web-archive...

User accounts have other json files such as _mylists.json, but they are access-restricted and cannot be downloaded, unfortunately.


That's seems to be an index, but there's no content? How does it map to content? For example, how can I see the tweet or website from those points in time?



That's good! I can confirm that this JSON does not exist for the deleted account, so it would need restoring from their backup.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: