Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I get your point, and different people have different things they are comfortable with. Some of that also depends on your environment, in my case I've had this script running nightly backups across hundreds of machines over a dozen years. Sending hundreds of e-mails a day into my inbox isn't going to be workable. But if you have one or two machines, maybe it is. I still don't think so, but again different people.

Things I have done to ensure reliability (again, this core script has been running for a dozen or more years):

- Nagios monitoring of backups: An active check from a monitoring server that alerts if no recent successful backups.

- "paper path" monitoring of e-mail: Send an e-mail to an external mailbox and have an active check in Nagios that reports if it has not seen an e-mail recently.

- With hundreds of machines, we were in the management interface enough (not daily, but at least monthly) that we would tend to notice before TOO long if something was out of whack.

- Regular backup audits: We would perform quarterly backup audits of the important machines, we had a whole workflow for those, which would also give us confidence that the backups were running as expected and that if something got out of whack it didn't go too long. Many of these depend on your definition of "too long".

As far as "zfs send", I totally agree. However, even today I have very few machines other than my backup machines that are running ZFS, so that's not really an option for these backups.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: