Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Any idea why Druid performed so poorly though? 100x slower seems odd. I though druid was reasonably good at single table analytics like in this benchmark. Is it the small data size?


It does seem odd, especially since in real world cases I'm more accustomed to seeing Druid and ClickHouse be in the same ballpark of performance. Sometimes one is somewhat faster than the other. But in my experience that's more like 2–3x difference in one direction or the other depending on workload, not 100x.

Hard to say why more of a difference shows up here, since I haven't analyzed the benchmark. It's possible the Druid configuration is suboptimal in some way. It's also possible it has something to do with the setup. It appears that the ClickHouse tests were done using a local table, which there isn't an equivalent of in Druid. Druid treats every table like what ClickHouse would call a "distributed" table. My understanding is using a distributed table in ClickHouse adds overhead since the system can no longer assume all data is on a single server. It may be that using distributed queries in both systems would yield a different result. And of course it may be that some of the test queries exercise functionality where ClickHouse is legitimately better optimized. But, again, hard to say anything for certain without detailed analysis.


In this benchmark, druid was killed and restarted after every query because druid seems to get into a degraded state otherwise. Very very likely that there is something wrong with the druid setup here. It would have been useful to know basic details such as druid version.


You can see that setup used is the one provided in a package: single-server/medium. It makes sense to improve setup, but I recommend to provide better configuration by default. I think it is a common courtesy that system should just work at most cases without hard tuning.


Many queries that did do not run had aggregations over strings like MIN/MAX. I don't know specifics why many Java based DBMS lack this aggregation functions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: