OK guys, running on a single instance is REALLY a BAD IDEA for non-pet-projects. Really bad! Change it as fast as you can.
I love Hetzner for what they offer but you will run into huge outages pretty soon. At least you need two different network zones on Hetzner and three servers.
I think you're being overly dramatic. In practice I've seen complexity (which HA setups often introduce) causing downtimes far more often than a service being hosted only on a single instance.
You'll have planned downtime just for upgrading MongoDB version or rebooting the instance. I don't think that this is sth you'd want to have. Running MongoDB in a replica set is really easy and much easier than running postgres or MySQL in an HA setup.
No need for SREs. Just add 2 more Hetzner servers.
The sad part of that is that 3 Hetzner servers are still less than 20% of the price of equivalent AWS resources. This was already pretty bad when AWS started, but now it's reaching truly ridiculous proportions.
from the "Serverborse": i7-7700 with 64GB ram and 500G disk.
37.5 euros/month
This is ~8 vcpus + 64GB ram + 512G disk.
585 USD/month
It gets a lot worse if you include any non-negligible internet traffic. How many machines before for your company a team of SREs is worth it? I think it's actually dropped to 100.
Sure, I am not against Hetzner, it's great. I just find that running sth in HA mode is important for any service that is vital to customers. I am not saying that you need HA for a website. Also, I run many applications NOT in HA mode but those are single customer applications where it's totally fine to do maintenance at night or on the weekend. But for SaaS this is probably not a very good idea.
Yes, any time someone says "I'm going to make a thing more reliable by adding more things to it" I either want to buy them a copy of Normal Accidents or hit them over the head with mine.
How bad are the effects of an interruption for you? Google has servers running every day, but you with one server can afford to gamble on it, since it probably won't fail for years - no matter the hardware though, keep a backup, because data loss is permanent. Would you lose millions of dollars a minute, or would you just have to send an email to customers saying "oops"?
Risk management is a normal part of business - every business does it. Typically the risk is not brought down all the way to zero, but to an acceptable level. The milk truck may crash and the grocery store will be out of milk that day - they don't send three trucks and use a quorum.
If you want to guarantee above-normal uptime, feel free, but it costs you. Google has servers failing every day just because they have so many, but you are not Google and you most likely won't experience a hardware failure for years. You should have a backup because data loss is permanent, but you might not need redundancy for your online systems. Depending on what your business does.
HA can be hard to get right, sure, but you have to at least have (TESTED) plan for what happens
"Run a script to deploy new node and load last backup" can be enough, but then you have to plan on what to tell customers when last few hours of their data is gone
I have a website with hundreds of thousands of monthly visitors running on a single Hetzner machine since >10 years (switched machines inside Hetzner a few times though).
My outage averages around 20 minutes per year, so an uptime of around 99.996%.
I have no idea where you see those "huge outages" coming from.
We have used Hetzner for 15+ years. There were some outages with the nastiest being the network ones. But they're usually not "dramatically bad" if you build with at least basic failover. With this we had seen less than 1 serious per 3 years. Most of the downtime is because of our own stupidity.
If you know what you're doing Hetzner is godsend, they give you hardware and several DCs and it's up to you what you can do. The money difference is massive.
There are so many applications the world is running on that only have one instance that is maybe backupped.
Not everything has to be solved by 3 reliability engineers.
agree on single instance, but for hetzner, I run 100+ large bare metal servers in hetzner, have for at least 5 years and there’s only been one significant outage they had, we do spread across all their datacenter zones and replicate, so it’s all been manageable. It’s worth it for us, very worth it.
Tell me about a service that needs this reliability please. I cannot think of anything aside perhaps some financial transaction systems, which all have some fallback message queue.
Also, all large providers had outages of this kind as well. Hell, some of them are partially so slow that you could call it an outages as well.
Easy config misstep and your load balancer goes haywire because you introduced unnecessary complexity.
I did that because I needed a static outgoing IP on AWS. Not fun at all.
Unpopular opinion here probably but:
Tinkering is also a great habit to be disappointed and unhappy.
I love software and programming, but the apologetic requirements that can come from users mean adding a lot of complexity to software, that leads to many bugs and very slow programs. Everything has a cost attached.
Never host your test environments as Subdomains of your actual production domain.
You'll also run into email reputation as well as cookie hell. You can get a lot of cookies from the production env if not managed well.
This. I cannot believe the rest of the comments on this are seemingly completely missing the problem here & kneejerk-blaming Google for being an evil corp. This is a real issue & I don't feel like the article from the Immich team acknowledges it. Far too much passing the buck, not enough taking ownership.
It's true that putting locks on your front door will reduce the chance of your house getting robbed, but if you do get robbed, the fact that your front door wasn't locked does not in any way absolve the thief for his conduct.
Similarly, if an organization deploys a public system that engages in libel and tortious interference, the fact that jumping through technical hoops might make it less likely to be affected by that system does not in any way absolve the organization for operating it carelessly in the first place.
Just because there are steps you can take to lessen the impact of bad behavior does not mean that the behavior itself isn't bad. You shouldn't have restrict how you use your own domains to avoid someone else publishing false information about your site. Google should be responsible for mitigating false positives, not the website owners affected by them.
First & foremost I really need to emphasise that, despite the misleading article title, this was not a false positive. Google flagged this domain for legitimate reasons.
I think there's likely a conversation to be had about messaging - Chrome's warning page seems a little scarier than it should be, Firefox's is more measured in its messaging. But in terms of the API service Google are providing here this is absolutely not a false positive.
The rest of your comment seems to be an analoy about people not being responsible for protecting their home or something, I'm not quite sure. If you leave your apartment unlocked when you go out & a thief steals your housemate's laptop, is your housemate required to exclusively focus on the thief or should they be permitted to request you to be more diligent about locking doors?
> First & foremost I really need to emphasise that, despite the misleading article title, this was not a false positive. Google flagged this domain for legitimate reasons.
Where are you getting that from? I don't see any evidence that there actually was any malicious activity going on on the Immich domain.
> But in terms of the API service Google are providing here this is absolutely not a false positive.
Google is applying heuristics derived from statistical correlations to classify sites. When a statistical indicator is present, but its target variable is not present, that is the very definition of a false positive.
Just because their verbiage uses uncertainty qualifiers like "may" or "might" doesn't change the fact that they are materially interfering with a third party's activities based on presumptive inferences that have not been validated -- and in fact seem to be invalid -- in this particular case.
> If you leave your apartment unlocked when you go out & a thief steals your housemate's laptop, is your housemate required to exclusively focus on the thief or should they be permitted to request you to be more diligent about locking doors?
One has nothing to do with the other. The fact that you didn't lock your door does not legitimize the thief's behavior. Google's behavior is still improper here, even if website operators have the option of investing additional time, effort, or money to reduce the likelihood of being misclassified by Google.
> its target variable is not present, that is the very definition of a false positive
The target variable is user hosted content on subdomains of a domain not listed in Mozilla's public suffix list. Firefox & Chrome apply a much stricter set of security settings for domains on that list, due to the inherent dangers of multiuser domains. That variable is present, Immich have acknowledged it & are migrating to a new domain (which they will hopefully add to Mozilla's list).
> The fact that you didn't lock your door does not legitimize the thief's behavior. Google's behavior is still improper here
I made no claims about legitimising the thief's behaviour - only that leaving your door unlocked was negligent from the perspective of your housemate. That doesn't absolve the thief. Just as any malicious actor trying to compromise Immich users would still be the primary offender here, but that doesn't absolve Immich of a responsibility to take application security seriously.
And I don't really understand where Google fits in your analogy? Is Google the thief? It seems like a confusing analogy.
> First & foremost I really need to emphasise that, despite the misleading article title, this was not a false positive. Google flagged this domain for legitimate reasons.
Judging by what a person from the Immich team said, that does not seem to be true?
So unless one of the developers in the team published something malicious through that system, it seems Google did not have a legitimate reason for flagging it.
Anyone can open a PR. Deploys are triggered by an Immich collaborator labelling the PR, but it doesn't require them to review or approve the code being deployed.
As I've mentioned in several other comments in this thread by now: The whole preview functionality only works for internal PRs, untrusted ones would never even make it to deployment.
The legitimate reason is that the domain is correctly classified as having user generated active content, because the Immich GitHub repo allows anyone to submit arbitrary code via PR, and PRs can be autodeployed to this domain without passing review or approval.
Domains with user generated active content should typically by listed on Mozilla's Public Suffix list, which Firefox & Chrome both check & automatically apply stricter security settings to, to protect users.
A safe browsing service is not a terrible idea (which is why both Safari & Firefox use Google for this) & while I hate that Google has a monopoly here, I do think a safe browsing service should absolutely block your preview environments if those environments have potential dangers for visitors to them & are accessible to the public.
To be clear, the issue here is that some subdomains pose a risk to the overall domain - visiting any increases your risk from others. It's also related to a GitHub workflow that auto-generates new subdomains on demand, so there's no possibility to have a fixed list of known subdomains since new ones are constantly being created.
It is a terrible idea when what is "safe" is determined arbitrarily by a private corporation that is perhaps the biggest source of malicious behavior on the web.
I think my comment came across a bit harsh - the Immich team are brilliant. I've hosted it for a long time & couldn't be happier & I think my criticisms of the tone of the article are likely a case of ignorance rather than any kind of laziness or dismissiveness.
It's also in general a thankless job maintaining any open-source project, especially one of this scale, so a certain level of kneejerk cynical dismissiveness around stuff like this is expected & very forgivable.
Just really hope the ignorance / knowledge-gap can be closed off though, & perhaps some corrections to certain statements published eventually.
.cloud is used to host the map embedded in their webapp.
In fairness, in my local testing sofar, it appears to be an entirely unauthenticated/credential-less service so there's no risk to sessions right now for this particular use-case. That leaves the only risk-factors being phishing & deploy environment credentials.
Our strategy has always been to use as little higher abstractions from cloud providers as possible. Glad we went this way, saved us quite a bunch of SLA breaches today!
I am confident to say that it's "best of both worlds". We get great availability zone redundancy by AWS without having to rely on and pay for all those PaaS stuff the cloud giants offer. Also, we can "fairly easy" migrate to any other cloud provider because we only need Debian instances running.
Yes, they use apps to break the law.
But, still, regulation - when in doubt - should be avoided. Did you know that in Germany, you need to send your employees to a specialised training if they use a ladder in their day to day work? You don't need to regulate what's common sense.
What is ridiculous is that you think this isn't a good idea. Safely using ladders isn't common sense and ladder injuries probably cost the state and the places where they occur a lot of money.
I think you are mistaking your point of view, which is probably that of an individual business owner, for the point of view of someone looking at the actuarial statistics or whatever and seeing tens of thousands of preventable ladder injuries a year. Just because an event is rare from your point of view doesn't mean that the event costs nothing or that it should be ignored.
I can't believe how common this attitude of "if its too small for me to notice it doesn't matter" is.
The analysis isn't done yet though:
- How much do you trust the statistics about which ladder deaths were preventable?
- Do you have the numbers on the counter-factual: once ladder training is introduced, these sub-populations see X reduction in ladder deaths, offsetting for reduction in ladder use due to people not having their ladder license?
- What is the productivity cost of assigning every single ladder user a training class, in perpetuity? This analysis should include the cost of creating a cottage ladder training industry that provides the trainings, the hourly productivity loss of sending people to trainings, the administrative cost of ensuring the trainings have been conformed to, etc.
In your heart of hearts, when you are assigned mandatory trainings, how much do you learn? I'm not asking how much _could_ you learn, I'm asking how much DO you learn? My experience, and the obvious unspoken consensus of all my colleagues, is that you click through mandatory virtual trainings as fast as possible, with the sound down, on fast-forward. If it's a live training with an actual practical skill (like ladder training), then I'd definitely concede it's much more engaging and you probably learn something. But MANY trainings are clearly, obviously, a net friction on society.
"I see a problem - how about we make a law that everybody must learn about that thing?" is the crappiest, laziest way to address the problem that you could possibly think of. If 'mandate a training' was analogized to a pull request on a codebase, it would be like responding to a bug report by adding a pop-up dialog that always pops up whenever you open the program and warns you about the bug. In other words, the shittiest possible non-solution that lets somebody close the issue as resolved. A real solution takes more work and more thinking.
I love this comment. I am so sick and tired of the term "common sense" being used as a panacea for those on the bottom of a Dunning Kruger curve to justify wanting their ignorance to be taken as seriously as other people's knowledge. I can think of dozens of ways someone could misuse a ladder that would definitely result in property damage and quite possibly injuries and even fatalities.
I wonder how many people are killed yearly because they buy various tools and don't read the damn instructions because they're definitely smart enough to use this and be safe already, it's common sense after all!
You do need to regulate what is common sense to protect employees. There's a lot of pressure from employers to do things that go against common sense, accidents happen. The employee is hurt, employer doesn't care. A large role of regulation is to protect employees from greedy employers. That's why some employers like illegals, they don't complain about not following regulations.
That's such a gross simplification that it's highly misleading. Yes you need training to if your work involves ladders, just like any other tool that is potentially dangerous. The type of training highly depends on the danger, so for someone using a stepladder to get something from a shelf will not need more than "use the ladder from the closet, put it back after use", while someone climbing up to the antennas on a skyscraper will receive very different training (i.e. always clip in your safety harness, how to use double carabiners ...). I don't know how this is controversial?
We had someone come to our house to work on a range hood. They didn’t have ladder training, so the insurance company wouldn’t cover it if they fell off the ladder.
The range hood repairman left without doing any work. I do wonder what a normal day at work looks like for this person. We weren’t billed for the house call.
Par for the course for a vanload of meth-heads who've never attended an hour of formal training in their life to be walking around a 45 degree roof without a harness, or one clipped into an ornamental non-structural member.
What idiot would ever use a ladder in a dangerous way?
It's me, I'm the idiot.
Standing on the top step? Yes. Putting a hammer on the top step and forgetting about it? Uh huh. Putting the ladder on plywood on a mattress so I could change a lightbulb without moving the bed and buying a taller ladder? You better believe it. Using a paint sprayer with a 25' latter and no spotter? Absolutely not. Wait, yes.
Who of you have ever been on the hiring side? I will tell you that it is frustrating in the very same way at it is for applicants.
How can you tell as a recruiter if a resume is good? People can put anything on it. Did I work with SAP 20 years ago? Yes, for two weeks! And I can simply put that on my CV. Candidates do that with every piece of technology.
Ok, how to test this then, that they actually master the technology?
Real interview of 2h with maybe a coding challenge. "This does not respect my time and anyway I cannot code under stress" will some people complain.
OK, then maybe some automated offline/online task? "Why do I need to solve some algorithmic nonsense without ever speaking to a person? They don't respect me as a person"
Hm ok, then maybe a real interview in house. But with how many candidates when I get 100+ applications for a position. I CANNOT talk to all of them...
So in the end it's again statistics... Filter out those where the probability is high that they are fast learners and dedicated. What is a good indicator of this? Well, high school and uni grades....
I’ve been on the hiring side. It is hard but everywhere I’ve worked I’ve felt like it’s just been a fear reaction on the company’s side about possibly spending a dollar on a bad hire.
I had one position I was hiring for, for over a year where I just straight up told my manager that I didn’t care to interview anyone anymore until he was ok with them.
The process at every single place I’ve worked at was built to find a reason _not_ to hire someone because we might find the perfect candidate next week
(Depending on company culture) There is a lot of risk in making a bad hire, because in many companies it's very hard to fire someone for poor performance or being difficult to work with. In those situations, it's safer to leave a position vacant instead of making a bad hire.
One workaround is to hire contractors, or contract-to-hire. It does put risk on candidates who are leaving other jobs, though. (I won't contract-to-hire when I'm leaving a job, but I will if I'm unemployed.)
How come a company who does X and already have employees doing X can't find new hires?
I'm sincerely wondering. So if you have 50 people on the payroll to do "SAP", where did they come from? A school? A course? Didn't they have coursemates to reach out to for more workers? Don't people and companies have networks? How can things deteriorate to the level that you have to put out ads for total strangers to apply?