Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And thus was ruined hundreds or thousands of pleasant Sunday afternoons.

I don’t miss being on pager duty one bit. I see it looming in my headlights, sadly.



Spare a thought for the pleasant Australian early Monday mornings too! Always a rude awakening...


It's the Queen's birthday, a Monday off here in New Zealand...

... but not for everybody now.


So what happens when the crown changes? They change the holiday? Immediately? For the next year? Sounds like a bit of a nightmare.


The holiday is on the official birthday. The sovereign's actual birthday has been separate from the official birthday for centuries, so the holiday does not need to change.


Nah, it's not even her actual birthday. Different countries with the same queen even celebrate it on different days. Presumably it'll be renamed to "king's birthday" but the day kept the same when the monarch changes. Or done away with/re-purposed - there's a general feeling in Australia at least that once the queen dies there will be less support for the monarchy.


If you think that's a hassle, in Japan the calendar changes with the emperor:

https://www.theguardian.com/technology/2018/jul/25/big-tech-...


Australia celebrates the Queen's Birthday public holiday on different dates in different states already.


It’s not actually the queens birthday.

In Australia, many states have different dates for the queens birthday.

So not a nightmare at all.


The only response is to wait for Google to fix it.

Nothing you or I or the pager can do will speed that up.

I am aware some bosses won't believe that and I am not trying to make light of it. But there really isn't much else to do except wait.


Or you wait for Google or you are frantically trying to move everything you got to AWS.


If you wait, you get back to 100% with no effort or stress on your part.

If you try to be heroic, you get back to 100% with a bunch of wasted effort and stress on your part.

Because it will be fixed by Google, regardless of what you do or don't do.

After the incident is over would be the time to consider alternatives.


So, for some companies, failing over between providers is actually viable and planned for in advance. But it is known in advance that it is time consuming and requires human effort.

The other case is really soft failures for multi-region companies. We degrade gracefully, but once that happens, the question becomes what other stuff can you bring back online. For example, this outage did not impact our infrastructure in GCP Frankfurt, however, it prevented internal traffic in GCP from reaching AWS in Virginia because we peer with GCP there. Also couldn't access the Google cloud API to fall back to VPN over public internet. In other cases, you might realize that your failover works, but timeouts are tuned poorly under the specific circumstances, or that disabling some feature brings the remainder of the product back online.

Additionally, you have people on standby to get everything back in order as soon as possible when the provider recover. Also, you may need to bring more of your support team online to deal with increased support calls during the outage.


Multi-cloud for those times when you really need that level of availability and can afford it.


It's not even about being able to afford it. Some things just don't lend themselves to hot failover. If your data throughput is high, it may not be feasible or possible to stream a redundant copy to a data center outside the network.


All parts of the system should be copied (if you decided to build multi-cloud system), not just some of them.


Do you work at G?


Nope. I was more thinking of everyone else.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: