dashboard tools like Chartio and Looker make self-service analytics "easy" so you don't need to write 10 different flavors of the same query. We see this happening every day - our biggest customer has 1,200 analysts on Looker.
The key is how you set up your data warehouse, e.g. Amazon Redshift. What we see happening that data engineering teams provide "data services" to their company, via a set of common schemas / tables.
At the very high level:
- set up two different schemas. A (1) raw schema in which you dump all your event level data. only data engineers are allowed to access that schema. and then (2) an ad-hoc schema that analysts can use to run their queries.
- You move data from the raw to the ad-hoc schema with scheduled transformations / aggregations. Airflow, Luigi, Pinball, dbt are popular tools for that purpose. The tables in the ad-hoc schema need to be well documented so analysts can understand what data is available.
- Give every analysts a dashboard seat and access to the ad-hoc schema. Give them access to "SQL playbooks" that analysts can re-use. If you're adventurous, allow them to create their own tables.
This approach scales from a few GBs to TBs and more.
Yes, like what scapecast mentioned, the key to writing and executing SQL well is to organize and transform your data. You won't want to be running queries across large events records.
We've just published a blog post last week on some tips for analysts starting their first data warehouse project.
The key is how you set up your data warehouse, e.g. Amazon Redshift. What we see happening that data engineering teams provide "data services" to their company, via a set of common schemas / tables.
At the very high level:
- set up two different schemas. A (1) raw schema in which you dump all your event level data. only data engineers are allowed to access that schema. and then (2) an ad-hoc schema that analysts can use to run their queries.
- You move data from the raw to the ad-hoc schema with scheduled transformations / aggregations. Airflow, Luigi, Pinball, dbt are popular tools for that purpose. The tables in the ad-hoc schema need to be well documented so analysts can understand what data is available.
- Give every analysts a dashboard seat and access to the ad-hoc schema. Give them access to "SQL playbooks" that analysts can re-use. If you're adventurous, allow them to create their own tables.
This approach scales from a few GBs to TBs and more.