Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Query Google Sheet data using PostgreSQL clients (roapi.github.io)
122 points by houqp on May 31, 2022 | hide | past | favorite | 22 comments


You can query a Sheet with Clickhouse out of the box. Obviously doesn't

    with sheet as (
    select \* 
    from 
    url('https://docs.google.com/spreadsheets/d/1XGCy0tYU5YcEouO09_ErZIyqjA-VJ4pidLZmMmJkEdk/gviz/tq?tqx=out:csv&sheet=Sheet1&range=A:C', CSVWithNames)
    )

    select \* from sheet


https://luabase.notion.site/Query-a-Google-Sheet-905da4e981b...


I use roapi to power my “book list” on my static site. I have it running on an always-on Replit repl. Try it here:

https://charlieharrington.com/library/

Click the “I’m feeling lazy” button if you just want to see some example queries.

I also wrote a tutorial, as well, if you’d like to try out roapi yourself:

https://charlieharrington.com/sql-powered-reading-list/


I spent years using Excel but I mostly use Google Sheets now. The cloud integration just feels much more effortless compared to Microsoft’s Office products ported to cloud. I’ve seen departments using GS as a system of record for quickly setting up master data like budgets and forecasts. I wish it had database connectivity the way Excel has ODBC.


I founded https://seekwell.io/ because I had that same wish a few years backs.

Also worth checking out https://equals.app/


Seekwell.io is giving me a 502 Error as of now.


weird, GCP might have been down a moment? Is it working now?


Have you looked at connected sheets? https://cloud.google.com/bigquery/docs/connected-sheets


That article describes how to use GS as a front end for BigQuery data. I’m describing the opposite, which is ingesting spreadsheet data into databases.


Steampipe [1] is an open source Postgres Foreign Data Wrapper with 73 plugins [2] including Google Sheets support [3]. A lot of our users use it to connect custom metadata with other systems - e.g. join ownership info in Google Sheets with tags on AWS resources. Notes: I'm a lead on the project, it's focused on read only.

1 - https://steampipe.io 2 - https://hub.steampipe.io/plugins 3 - https://hub.steampipe.io/plugins/turbot/googlesheets


unrelated: I used steampipe to easily automate some compliance audits on AWS, was great to do be able to look at all s3 buckets and make sure they're encrypted.


Last time I tried to do the moral equivalent of this (not using SQL, just some custom python), I ran afoul of the rate limits really quickly.

Does this client do something smart to mitigate that, have the limits been relaxed, or is it still a problem?


We cache all the data in memory in Arrow format so queries don't need to go through google api, it will only hit google api when a data refresh is needed.


How does cache invalidation work?


It's manual for now, ROAPI is designed for slowly moving datasets. You just hit the data update API to force a refresh.

I have plan to add automated streaming data update in the background, starting with delta lake tables. It should all be very straight forward to implement.


I have thought about using google sheets as the database for projects many times.

The great thing about it is that you get a very versatile admin UI for free. The downsides however are rate limits and the fact that I can't think of any way to build in some sort of safety that would prevent people from accidentally deleting a bunch of data with a mouse click. I would love to hear if anyone has real life experience with the approach.


> The great thing about it is that you get a very versatile admin UI for free. The downsides however are rate limits and the fact that I can't think of any way to build in some sort of safety that would prevent people from accidentally deleting a bunch of data with a mouse click.

My experience is that this is always the issue with spreadsheets when you try and grow them out to try and behave like more powerful tools. The same functionality (flexibility etc.) that makes them great for 1-5 people to work on means that it's hard to constrain the system as much as you'd like to when you grow it.

Having said that there are things you can do, like using protected ranges in gsheets that only certain users can edit, or input validation that makes it harder to add bad data. The built in version history is also a handy fall back in case someone does bork the sheet and you catch it in time.


Check out the following -- I've found them to be very useful:

- Baserow[0]

- NocoDB[1]

Also, modern CMSes that are DB-first (or at the very least try their best not to mangle the DB and provide programmatic access) are basically completely overlapped with this use case and can be good as well.

[0]: https://baserow.io

[1]: https://nocodb.com


Awesome, thank you!


I use it in one project to store public metadata of podcasts. The CI/CD pipeline downloads the google sheet via the public url, does some custom parsing and uses the data to build a react app. Worked like a charm for this use case, some fiddling required to get the parsing done right, though. And there is no api limit if your data is public. For using it as a live read-only database I am not sure if it's a good idea. Generally google sheets has a decent history of versions which makes recovery to an older version possible.


This is cool; from reading the descriptions I presume this doesn't support writing - is there any intention to do so?


Nevermind, I just worked out how to get to the overview. ROAPI seems to stand for read-only API :)


Using this with materialized views could be very useful!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: