I tried it for a while and it sure has potential has one of those modern monitoring systems that are replacing nagios right now.
However, in production I would not want to run it in a docker. I would want to setup my own server with option to scale it to remote pollers.
In my org we ended up choosing another nagios replacement, but not because of any flaw in bosun.
I love iterating over the main points that we look for in a monitoring solution.
Self-hosted. Scalable, remote pollers that can plugin to the central servers. Locations, remote pollers can add locations to monitor from. Collector agent that runs periodically from monitored servers instead of the nrpe model that listens to connections. The collector OS agent is windows compatible and backwards compatible with nagios scripts. Monitoring focuses on sending metrics first and foremost, so you can set thresholds for metrics, just like bosun does. And of course, with those metrics the web gui draws fancy graphs for everything.
And last but not least, all of this, monitoring agent, pollers, they all use a standard API like REST or xmlrpc.
> At Stack Exchange we do not use Docker in production. For those that do not wish to use docker, we provide binaries for bosun at bosun.org, but you will also need to install OpenTSDB and HBase yourself.
Correct, our docker instance is really designed towards letting people play with it quickly without all the trouble of a production setup.
Stack Exchange's production setup isn't documented, but we use Cloudera for HBase, relay all data through the tsdbrelay cmd (found in the bosun repo), have HAProxy in front of it all.
* Scalable: 1/2 * √. The web interface doesn't have pagination, so series with larget tagsets can cause some GUI problems. We have been having some trouble with OpenTSDB lately. People have much larger OpenTSDB installation than we do though, HBase isn't one of our best skills. But it is scaling okay for Stack Exchange
* Remote Poller: √. Our agent Scollector can run in a polling mode for things that need polling like SNMP, VSphere, etc
* Windows Support: √. This is one of the main reasons we built scollector. It is a single binary with no dependencies. We spend a lot of time digging into the WMI Raw performance counters to get the best data we could.
* Backward Compatible with Nagios Scripts. Negative, but sc ollector can use external scripts, but they are not in the nagios format.
* Thresholds: √ This is the most basic form of alerting Bosun does. You can also construct forecast, anomalous, and multiple condition alerts as well. The power in what you can do with alerting is really where Bosun shines.
Not the OP, but we went with Icinga2. Aside from some crappy pre- and post-upgrade scripts, it works remarkably well, and is compatible with the Nagios monitoring plugin ecosystem.
Add in some Skyline (based off the Etsy project), graphite, and collectd, and it makes for a flexible and extensible monitoring solution.
I want to avoid a shameless plug because the only reason we chose this solution was because we bought the company that make it so we essentially own it now.
For anyone else it breaks the self-hosted requirement since it's a Monitoring as a service called monitorscout.com.
However, in production I would not want to run it in a docker. I would want to setup my own server with option to scale it to remote pollers.
In my org we ended up choosing another nagios replacement, but not because of any flaw in bosun.
I love iterating over the main points that we look for in a monitoring solution.
Self-hosted. Scalable, remote pollers that can plugin to the central servers. Locations, remote pollers can add locations to monitor from. Collector agent that runs periodically from monitored servers instead of the nrpe model that listens to connections. The collector OS agent is windows compatible and backwards compatible with nagios scripts. Monitoring focuses on sending metrics first and foremost, so you can set thresholds for metrics, just like bosun does. And of course, with those metrics the web gui draws fancy graphs for everything.
And last but not least, all of this, monitoring agent, pollers, they all use a standard API like REST or xmlrpc.