Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good points.

To my point though, you mention that you noticed a "consistent" max of 2hr. Meaning the consistency was the interesting part. You probably consider this issue to be more important than some other hypothetical issue that caused, say, literally a single request to have a 2hr latency on Tuesday.

In other words, if you monitored over a period that was long enough to establish a statistically significant p99.9 or even p99.99 and then measured those percentiles, you still would have not only caught this issue, but also distinguished it from the other hypothetical issue. Monitoring your maximum and worrying about consistently high maximums might be a more effective mechanism, but in principle, you still care more about top percentiles.



In sum: top percentiles are more important than the max; but monitoring the max can still be useful.

Just like monitoring the arithmetic average ain't completely useless either. It's just far from the most important metric.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: