Optimizing an application is better than adding servers, because it's faster for any one user, not just scaling better.
> optimizing your app to fulfill a request in 1/10 the time is like adding 9 servers to a cluster.
Not if the request executes on one server. Because then it's like saying that optimizing baby gestation to take place in one month is like adding 8 uteri.
It would be like adding 8 uteri, if your requirements are “give me x babies per second”
In both cases you can get 9 babies in 9 months, and can support the same amount of baby throughput. But in the 1 month gestation case, you only need to feed one mother.
I think what kazinator is trying to say is that request latency is also important, in addition to throughout, because users will notice when requests are faster.
The mean number of in-flight requests is calculated off of the arrival rate and time per request. If you can retire requests faster, you're available for the next one.
I would argue that optimizing the server, in the uteri comparison, would translate to having twins or triplets (and so on). With one uteri you'd get more babies with no additional uteri.
In real life you would also be in for some real "fun" after nine months ;-)
> optimizing your app to fulfill a request in 1/10 the time is like adding 9 servers to a cluster.
Not if the request executes on one server. Because then it's like saying that optimizing baby gestation to take place in one month is like adding 8 uteri.