> Prior to this, getting up and running from a cold-start might’ve required inst...

wenc · 2025-05-04T00:18:55 1746317935

That requires data to already be in Postgres, otherwise you have to ETL data into it first.

DuckDB on the other hand works with data as-is (Parquet, TSV, sqlite, postgres... whether on disk, S3, etc.) with requiring an ETL step (though if the data isn't already in a columnar format, things are gonna be slow... but it will still work).

I work with Parquet data directly with no ETL step. I can literally drop into Jupyter or a Python REPL and duckdb.query("from '*.parquet'")

Correct me if I'm wrong, but I don't think that's possible with Postgis. (even pg_parquet requires copying? [1])

[1] https://www.crunchydata.com/blog/pg_parquet-an-extension-to-...

Demiurge · 2025-05-04T00:36:00 1746318960

Yeah, if you want to work with GeoParquet, and you want to keep your data in that format. I can see how that's easer to use your example. That's not what a lot of geospatial data is in. You might have shapefiles, geopackages, geojsons, who knows? There is a lot of software, from QGIS to ESRI to work with different formats to solve different problems. I don't think GeoParquet, even though it might be the fastest geospatial vector data format right now, is that common, and the article did not claim that either. So, given an average user trying to answer some GIS question, some ETL is pretty much a given, on average. And given that, installing PostGIS and installing DuckDB, both require some ETL, and learning some query and analytics language. DuckDB might be an improvement, but it's certainly not as much of a leap as quote is making it out to be.

sroerick · 2025-05-04T16:07:13 1746374833

From the DuckDB geospatial docs:

SELECT * FROM './path/to/some/shapefile/dataset.shp';

COPY table TO 'some/file/path/filename.geojson' WITH (FORMAT gdal, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

This seems extremely helpful to me.

Demiurge · 2025-05-04T16:25:50 1746375950

And ogr2ogr is also very helpful:

ogr2ogr -f "PostgreSQL" -t_srs "EPSG:2274" PG:"host=host user=user dbname=database password=trustno1 schemas=schema" shapefile.shp -nln new_name_for_table

I'll give DuckDB a shot and compare to postgis. Eventually, I think the issue for me would be that I use all the vector outputs via apps connecting to the same database.

jeffbee · 2025-05-04T01:25:46 1746321946

Yeah, just an example of a QoL issue with DuckDB: even though it can glob files in other cases, the way it passes parameters to GDAL means that globs are taken literally instead of expanded. So I can't query a directory with thirty million geojson files. This is not a problem in geopandas because ipython, being a full interactive development environment, allows me to produce the glob any way I choose.

I think this is a fundamental problem with the SQL pattern. You can try to make things just work, but when they fall then what?

maxxen · 2025-05-04T01:35:47 1746322547

I think this is just cause it hasn't been implemented in spatial yet. DuckDB is currently going through a pretty big refactor of the way we glob/scan/union multiple files with all the recent focus on data lake formats, but my plan is to get to it in spatial after next release when that part of the code has stabilized a bit.

ffsm8 · 2025-05-04T05:47:21 1746337641

> fundamental problem with the SQL pattern.

SQL is a DSL and yes, all Domain Specific Languages will only enable what the engine parsing the DSL supports.

But all SQL database I'm aware of let you write custom extensions, which are exactly that: they extend the base functionality of the database with new paradigms. I.e postgis enabling geospatial in postgres or the extensions that enable fuzzy-matching/searching.

And as SQL is pretty much a turing-complete DSL, there is very little you can't do with it, even if the syntax might not agree with everyone

broner · 2025-05-04T21:26:52 1746394012

You can use DuckDB in ipython to solve the globbing issue. Then you don't have to worry about OOMs with geopandas.

edoceo · 2025-05-04T00:27:59 1746318479

Not wrong. Load to PG, then query. Duck UVP is like bringing 8 common tools/features under one tent.

paradox460 · 2025-05-04T17:35:35 1746380135

The original article feels a tremendous amount like another piece of DuckDB marketing, from the breathless admiration to the baseless claims like the title

keynesyoudigit · 2025-05-09T23:40:41 1746834041

The benefits of a simple install were exaggerated, I think the real point is the cloud native integration and the sheer scalability

jokoon · 2025-05-04T11:04:08 1746356648

I tested spatialite, it works okay, but the setup is a bit tedious when inserting data.