You can query a Sheet with Clickhouse out of the box. Obviously doesn't
with sheet as (
select \*
from
url('https://docs.google.com/spreadsheets/d/1XGCy0tYU5YcEouO09_ErZIyqjA-VJ4pidLZmMmJkEdk/gviz/tq?tqx=out:csv&sheet=Sheet1&range=A:C', CSVWithNames)
)
select \* from sheet
I spent years using Excel but I mostly use Google Sheets now. The cloud integration just feels much more effortless compared to Microsoft’s Office products ported to cloud. I’ve seen departments using GS as a system of record for quickly setting up master data like budgets and forecasts. I wish it had database connectivity the way Excel has ODBC.
That article describes how to use GS as a front end for BigQuery data. I’m describing the opposite, which is ingesting spreadsheet data into databases.
Steampipe [1] is an open source Postgres Foreign Data Wrapper with 73 plugins [2] including Google Sheets support [3]. A lot of our users use it to connect custom metadata with other systems - e.g. join ownership info in Google Sheets with tags on AWS resources. Notes: I'm a lead on the project, it's focused on read only.
unrelated: I used steampipe to easily automate some compliance audits on AWS, was great to do be able to look at all s3 buckets and make sure they're encrypted.
We cache all the data in memory in Arrow format so queries don't need to go through google api, it will only hit google api when a data refresh is needed.
It's manual for now, ROAPI is designed for slowly moving datasets. You just hit the data update API to force a refresh.
I have plan to add automated streaming data update in the background, starting with delta lake tables. It should all be very straight forward to implement.
I have thought about using google sheets as the database for projects many times.
The great thing about it is that you get a very versatile admin UI for free. The downsides however are rate limits and the fact that I can't think of any way to build in some sort of safety that would prevent people from accidentally deleting a bunch of data with a mouse click. I would love to hear if anyone has real life experience with the approach.
> The great thing about it is that you get a very versatile admin UI for free. The downsides however are rate limits and the fact that I can't think of any way to build in some sort of safety that would prevent people from accidentally deleting a bunch of data with a mouse click.
My experience is that this is always the issue with spreadsheets when you try and grow them out to try and behave like more powerful tools. The same functionality (flexibility etc.) that makes them great for 1-5 people to work on means that it's hard to constrain the system as much as you'd like to when you grow it.
Having said that there are things you can do, like using protected ranges in gsheets that only certain users can edit, or input validation that makes it harder to add bad data. The built in version history is also a handy fall back in case someone does bork the sheet and you catch it in time.
Check out the following -- I've found them to be very useful:
- Baserow[0]
- NocoDB[1]
Also, modern CMSes that are DB-first (or at the very least try their best not to mangle the DB and provide programmatic access) are basically completely overlapped with this use case and can be good as well.
I use it in one project to store public metadata of podcasts. The CI/CD pipeline downloads the google sheet via the public url, does some custom parsing and uses the data to build a react app. Worked like a charm for this use case, some fiddling required to get the parsing done right, though. And there is no api limit if your data is public. For using it as a live read-only database I am not sure if it's a good idea. Generally google sheets has a decent history of versions which makes recovery to an older version possible.