Chronosphere launches with $11M Series A to build scalable monitoring tool

crablar · on Nov 5, 2019

We did a few shows with Rob about monitoring and time series: https://softwareengineeringdaily.com/2019/08/21/time-series-... https://softwareengineeringdaily.com/2019/02/12/ubers-monito...

smithclay · on Nov 5, 2019

Congrats on the launch, Chronosphere is joining a fast-growing club of monitoring (and related) spinoffs from large Bay Area tech companies:

Uber -> Chronosphere Google -> Lightstep Facebook -> Honeycomb Twitter -> Buoyant (and Zipkin, OSS)

airocker · on Nov 5, 2019

What I am not getting from my superficial knowledge is that why is Prometheus getting so much traction over elastic search. Elastic search claims to be as good for metrics and events. The ES database itself is more advanced with eventual consistency and search capability. It can do log analytics and it can be backend to tracing tool like Jaeger. Why so much investment in Prometheus. Disclaimer: I have not used Prometheus too much myself.

richieartoul · on Nov 5, 2019

I think you'd be very hard pressed to scale an Elasticsearch cluster to 10s of millions of writes/s without breaking the bank (and even if you had a pile of money to light on fire I don't think an Elasticsearch cluster with the number of nodes you'd require to support that would work very well).

Elasticsearch is a great piece of technology and its very versatile which makes it a great fit for a lot of problems (Uber, where M3 was developed, is a heavy consumer of Elasticsearch for logging purposes for example), but for the types of metrics workloads and scale that M3/Prometheus were designed for Elasticsearch simply wouldn't work.

airocker · on Nov 5, 2019

Es is very efficient with metrics especially in recent releases.

PretzelFisch · on Nov 5, 2019

This might be true but Es has to overcome the history of burning their users via breaking changes, perf and reliability.

slovenlyrobot · on Nov 5, 2019

At their core these systems are basically specialized column stores, they have complete different read/write patterns to something like ES. The basic query unit for example is always going to be the scan, I'm not even aware of any monitoring system with some kind of secondary index capability. ES supports a bunch of nice result aggregation stuff on top of Lucene, whereas these systems are primarily /built for/ this use case

roskilli · on Nov 5, 2019

What's interesting about some of the more modern monitoring systems like M3 and Prometheus is that they have a reverse index on top of the column store entries to very quickly find the relevant metrics for a multi-dimensional query.

In fact M3 uses FST index segments, a common Apache Lucene segment which is used by ElasticSearch, for secondary index metric name and dimension full-text search capabilities: https://github.com/m3db/m3/tree/master/src/m3ninx/index/segm...

airocker · on Nov 5, 2019

Is there a resource to learn about this more? Also, a general introduction about how to design indexes depending on read/write patterns?

stampedes · on Nov 5, 2019

elasticsearch is the one thing i've worked with that i've had to learn to pretend i know nothing about. you do not want to get labeled the expert on that thing. it's nice and finicky.

er, crap, i've outed myself

sciurus · on Nov 5, 2019

The technical details of their software are described in https://eng.uber.com/m3/

This looks like a competitor to Cortex (https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-ho...).

bboreham · on Nov 5, 2019

They took a different path on the ”never build your own database” question.

valyala · on Nov 11, 2019

Another viable player that took the path similar to M3DB is VictoriaMetrics [1]. This allowed implementing various features [2] and optimizations [3] without the need to negotiate their integration into upstream Prometheus. Such negotiations can stuck forever. [4]

[1] https://github.com/VictoriaMetrics/VictoriaMetrics/

[2] https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/Exte...

[3] https://medium.com/@valyala/measuring-vertical-scalability-f...

[4] https://github.com/prometheus/prometheus/issues/3746

carty76ers · on Nov 5, 2019

> Chronoshere, a startup from two ex-Uber engineers, who helped create the open source M3 monitoring project to handle Uber-level scale, officially launched today with the goal of building a commercial company on top of the open source project.

I recall a thread here from 2-3 weeks ago about how “Uber-scale” wasn’t really Uber scale, and that most of these publicized “Uber-scale” projects ended up getting canned internally. Any insider insight to this M3 project?

roskilli · on Nov 5, 2019

Rob, co-founder and M3DB creator here, Uber collected billions of metric samples and we had tens of billions of metrics in M3 at Uber. Netflix for reference has not published any numbers higher than single digit billions of time series. The system has run in production for several years now at Uber now. That's my thoughts on the matter, hah.

bradhe · on Nov 5, 2019

New Relic touts collecting trillions of data points per day.

sciurus · on Nov 5, 2019

According to https://eng.uber.com/m3/

> Released in 2015, M3 now houses over 6.6 billion time series. M3 aggregates 500 million metrics per second and persists 20 million resulting metrics-per-second to storage globally (with M3DB), using a quorum write to persist each metric to three replicas in a region.

So, if that's accurate, they're collecting one trillion data points every two seconds.

roskilli · on Nov 5, 2019

So we collected and aggregated more than 1 billion samples of metrics per second, which resulted in writing more than 30-40 million unique metric datapoints per second to storage. This resulted in more than 10 billion unique time series being stored (each with a very large number of distinct datapoints each).

This was 3.6 trillion metric samples per hour or 2.5 trillion metric datapoints stored a day (after aggregating samples).

jwatte · on Nov 5, 2019

No, they're collecting one BILLION (with a b) data points every two seconds. Gotta go to 2000 seconds (a little over half an hour) for the TRILLION.

With a 25:1 reduction/summarization before writing. If they're smart, they do that summarization on the way in, rather than at the back-end layer. That's a billion data points written per minute, or a trillion and a half written per day!

sciurus · on Nov 7, 2019

Oops, don't know how I misread that! Thanks for the correction!

samarabbas · on Nov 5, 2019

Congrats Martin and Rob on the launch. M3 is one of the best tools I used at Uber. Something which just works. I'm sure you guys will be successful as I first hand witnessed the value it brings to an organization.

roskilli · on Nov 5, 2019

Hey Rob a co-founder and M3DB creator here, more than happy to answer any queries anyone might have. We're committed on continuing M3 being 100% apache 2 licensed, clustering and all other M3 features included. We're focused on providing reliable metrics hosting at scale.

wizardsOfOz · on Nov 5, 2019

Congrats Rob! Great to see you guys weren’t just good indoor soccer players ;). The M3DB architecture doc was a good read

roskilli · on Nov 5, 2019

Haha TY for the kind words, I had to stop playing years ago now unfortunately with family commitments - also I mainly enjoyed playing on a team than actually developing real soccer skills (which I relied on others in the team to pull me upwards, heh)

fnordsensei · on Nov 5, 2019

Is the name inspired by the building in C&C Red Alert by any chance?

https://cnc.fandom.com/wiki/Chronosphere_(Red_Alert_3)

That’s my personal reference to the word, but searching around a bit, it seems that it was registered as a trademark by a medical company already in 1991, 5 years before Red Alert.

https://trademark.trademarkia.com/chronosphere-74147725.html

reubenbond · on Nov 5, 2019

Congrats on the launch!

Metrics monitoring is hugely useful for figuring out what's going wrong (or right...) and where - especially when you can slice and dice by dimensions/tags. Microsoft (where I work) uses lots of metrics internally, for every sevice. It's nice to see M3/Chronosphere making this kind of thing more affordable and widely accessible.

hnarn · on Nov 6, 2019

One thing that I often miss when reading about this stuff is benchmarks. So it's faster than Prometheus? Prove it. So it's faster than Postgres, or TimescaleDB? Prove it.

It should be trivial, and the fact that it's not there and what you find instead is terms like "Uber-scale" is slightly worrying.

I'm not trying to take anything away from the achievements made here by the guys at Uber, but anyone seriously considering using this in production would probably need a better contrastive comparison between alternatives.

roskilli · on Nov 12, 2019

It's not raw speed or raw performance on a single node that M3DB is optimized for, it's for a reliable scale out story when you have a considerable number of instances required to collect the raw data you operate on (organizations of certain size/complexity run into this, not just a handful of organizations/companies).

Benchmarks tend to favor the authors and are frequently game-ified, look at GPU benchmarks like 3DMark that frequently had manufacturers release optimizations that were really only utilized in specific benchmarks.

bradhe · on Nov 5, 2019

> There weren’t any tools available on the market that could handle Uber’s scaling requirements

This isn't a problem that you can build a business around.

Edit: Ah, I get it. This is like a Mesosphere play--they're shepherding the M3 technology in the open source ecosystem and offering a commercial version. That makes more sense.

aa109 · on Nov 5, 2019

Really enjoyed using M3 at Uber. Great to see you guys continuing to support the open source community!

keenr · on Nov 5, 2019

It just seems another clone of datadog

im_down_w_otp · on Nov 5, 2019

The marketplace for DaaS (dashboards as a service) is really big.

devchix · on Nov 5, 2019

Splunk kool-aid drinker here; pardon my ignorant question, but why not just use Splunk?

Actually I think my real question is, why are there such a proliferation of these monitoring/logging/visualization -AAS startups? Who are the target customers, in terms of spends?

pram · on Nov 5, 2019

The most obvious answer is most of the new alternatives are open source and “free” while Splunk isn’t.

jayp · on Nov 5, 2019

Congrats on the launch Rob and Martin. M3 is an amazing product which I had the privilege to use at Uber. Wish you two the best for your journey ahead!