First of all, you can use "average of request processing times" which are not possible using other methods. Second of all this is a port of the tengine module used by Alibaba modified nginx, and I think it is more easy to use than other methods. Also when the load is to high, you can control what to do, what error/page will be shown. From a system engineering standpoint it is more easy, flexible, and appropriate to use.
I'm just trying to figure out tradeoffs here. The way I understand sysguard now, it will find out that the load is too high, after the load is too high. You can set the threshold lower to start limiting the traffic early on, but usage spikes will still get you.
I'm trying to compare it to alternative solution which prevents the high load in the first place (puts limit on the CPU/mem shares). By limiting the app rather than nginx itself, you can still set the error page for the case where the backend is too busy (handle 502 bad gateway).
The nice part of doing that at system level is that you can prioritise other processes over nginx, but otherwise ensure the app can take 100% CPU if nothing else is interested in using it.
Have you personally tried other methods when having an overload of server traffic?
Do you have a link showing a real setup of a better method?
Of note, the module is a port from the nginx tuned by alibaba which is quite a credible source.
I have served over a billion pageviews/month, and cutting off requests at a certain load threshold with a custom error page to users is a good idea. Otherwise the users will just get a huge slowdown or a backlog of requests which will cause a prolonged server recovery.
Also as the author vozlt mentioned, you can use "average of request processing times" and custom http errors. Also, it is much easier to use than the methods you suggest.
Think how you can limit the load, but have as few users affected, as possible. And I mean users that make requests and are quick to leave if latency feels high for them, not system users. System doesn't make limiting decisions on per request basis to be useful for this.
It's the "throttle" step here which is important. When overloaded, drop requests; otherwise your server will overload and crash, which can be difficult to recover from.
You can limit both the CPU shares and memory available to the app. (And network rate of it matters for the app) The result for the app server would be the same -> a throttle on the maximum number of requests it can serve. Your server will not overload and crash if your cgroup config prevents applications from taking all available resources.
There's an avalanche problem though. Serving them up immediately when you know you're unhealthy shortens the recovery time.
I don't know that this module is perfect for every situation, but I do know that timeouts don't work great when there's a general overload. Timeouts leave a higher number of concurrent connections there.
I'd prefer a sort of priority queue setup where http clients that already "got in", "stay in", and newer connections are pushed away. Rate limiting per client first might be better as well.
I used the timeout as a general term. You can have a timeout of 0 and N app workers. (see fail_timeout) If they're all busy, you would serve 502 immediately. Nginx can do the static file serving likely orders of magnitude faster than your app.
You could do that if there's a strong correlation of "N app workers" always being the edge of overloaded. Many apps, though, don't work that way...some app requests are heavyweight, others aren't.
I take your point that this module isn't a panacea. For several things I've seen neither your approach or this approach would work well. An api gateway that throttled per client would make more sense. This sysguard module, though, came from Alibaba...I imagine they are pretty sharp and not prone to making things they don't need.
That's a specific architecture I'm describing, sure. We could go much deeper with workers per route, greenthreaded apps, and many other things. But I still don't think that affects my main question: given a choice between reacting to crossing some threshold and enforcing a threshold you cannot cross - what's the benefit of the first one?
I don't think they are mutually exclusive. This module can be applied per path too. So I could set it up to turn off something heavyweight, but optional, like a search function, when CPU is high.
It also appears to be able to kick off any action, not just 503. So you could have it conditionally enact limit_req on heavyweight requests. That seems to match your desired approach.
Actually you can still put nginx inside a cgroup and have nginx read the loadAvg inside the CGROUP instead of the global host loadAVG by calling getloadavg();