More

kylebarron · 2025-11-18T04:14:57 1763439297

Looks great!

Another seemingly extremely similar project released in the last few days: https://github.com/raulcd/datanomy

kaushiksrini · 2025-11-18T13:34:28 1763472868

a growing need to look inside columnar data files!

kylebarron · 2025-08-09T16:29:04 1754756944

https://github.com/manzt/juv

esafak · 2025-08-09T16:50:09 1754758209

Good. This is the part I'm talking about: https://peps.python.org/pep-0723/

kylebarron · 2025-03-19T21:22:46 1742419366

Agreed, I find this to be a super productive environment, because you get all of vscode's IDE plus the niceties of Jupyter and IPython.

I wrote a small vscode extension that builds upon this to automatically infer code blocks via indentation, so that you don't have to select them manually: [0]

[0]: https://github.com/kylebarron/vscode-jupyter-python

kylebarron · 2025-02-07T20:03:13 1738958593

Equal distances to each adjacent neighbor: https://www.uber.com/blog/h3/

ajfriend · 2025-02-07T21:02:16 1738962136

They also only have one type of neighbor. Square grids have 2 neighbor types. Triangular grids have 3.

hammock · 2025-02-07T21:57:48 1738965468

Makes perfect sense. Thanks both

kylebarron · 2025-01-14T15:33:47 1736868827

This worked for me last month on a Mac M2: https://www.portingkit.com/game/278

kylebarron · on Sept 24, 2024

https://visgl.github.io/react-map-gl/ is a react interface to MapLibre

maelito · on Sept 30, 2024

Well, they should rewrite the home page. It says MapBox everywhere and MapLibre nowhere.

kylebarron · on Sept 24, 2024

I develop Lonboard [0], a Python library for plotting large geospatial data. If you have small data (~max 30,000 coordinates), leaflet-based Python libraries like folium and ipyleaflet can be fine, but because Lonboard uses deck.gl for GPU-accelerated rendering, it's 30-50x faster than leaflet for large datasets [1].

[0]: https://developmentseed.org/lonboard/latest/

[1]: https://developmentseed.org/lonboard/latest/how-it-works/

kylebarron · on April 22, 2024

It can read from HTTP urls, but you'd need to manage signing the URLs yourself. On the writing side, it currently writes to an ArrayBuffer, which then you could upload to a server or save on the user's machine.

kylebarron · on April 22, 2024

Arrow JS is just ArrayBuffers underneath. You do want to amortize some operations to avoid unnecessary conversions. I.e. Arrow JS stores strings as UTF-8, but native JS strings are UTF-16 I believe.

Arrow is especially powerful across the WASM <--> JS boundary! In fact, I wrote a library to interpret Arrow from Wasm memory into JS without any copies [0]. (Motivating blog post [1])

[0]: https://github.com/kylebarron/arrow-js-ffi

[1]: https://observablehq.com/@kylebarron/zero-copy-apache-arrow-...

lmeyerov · on April 23, 2024

Yeah, we built it to essentially stream columnar record batches from server GPUs to browser GPUs with minimal touching of any of the array buffers. It was very happy-path for that kind of fast bulk columnar processing, and we donated it to the community to grow to use cases beyond that. So it sounds like the client code may have been doing more than that.

For high performance code, I'd have expected overhead in %s, not Xs. And not surprised to hear slowdowns for any straying beyond that -- cool to see folks have expanded further! More recently, we've been having good experiences more recently here in Perspective <-arrow-> Loaders, enough so that we haven't had to dig deeper. Our current code is targeting < 24 FPS, as genAI data analytics is more about bigger volumes than velocity, so unsure. However, it's hard to imagine going much faster though given it's bulk typed arrays without copying, especially on real code.

kylebarron · on March 26, 2024

Sorry, this is not true _at all_ for geospatial data.

A quick benchmark [0] shows that saving to GeoPackage, FlatGeobuf, and GeoParquet are roughly 10x faster than saving to CSV. Additionally, the CSV is much larger than any other format.

[0]: https://gist.github.com/kylebarron/f632bbf95dbb81c571e4e64cd...

culebron21 · on March 26, 2024

And here's my quick benchmark, dataset from my full-time job:

  > import geopandas as gpd
  > import pandas as pd
  > from shapely.geometry import Point

  > d = pd.read_csv('data/tracks/2024_01_01.csv')
  > d.shape
  (3690166, 4)
  > list(d)
  ['user_id', 'timestamp', 'lat', 'lon']

  > %%timeit -n 1
  > d.to_csv('/tmp/test.csv')
  14.9 s ± 1.18 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

  > d2 = gpd.GeoDataFrame(d.drop(['lon', 'lat'], axis=1), geometry=gpd.GeoSeries([Point(*i) for i in d[['lon', 'lat']].values]), crs=4326)
  > d2.shape, list(d2)
  ((3690166, 3), ['user_id', 'timestamp', 'geometry'])

  > %%timeit -n 1
  > d2.to_file('/tmp/test.gpkg')
  4min 32s ± 7.5 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

  > %%timeit -n 1
  > d.to_csv('/tmp/test.csv.gz')
  37.4 s ± 291 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

  > ls -lah /tmp/test*
  -rw-rw-r-- 1 culebron culebron 228M мар 26 21:10 /tmp/test.csv
  -rw-rw-r-- 1 culebron culebron  63M мар 26 22:03 /tmp/test.csv.gz
  -rw-r--r-- 1 culebron culebron 423M мар 26 21:58 /tmp/test.gpkg

CSV saved in 15s, GPKG in 272s. 18x slowdown.

I guess your dataset is countries borders, isn't it? Something that 1) has few records and makes a small r-tree, and 2) contains linestrings/polygons that can be densified, similar to Google Polyline algorithm.

But a lot of geospatial data is just sets of points. For instance: housing per entire country (couple of million points). Address database (IIRC 20+M points). Or GPS logs of multiple users, received from logging database, ordered by time, not assembled in tracks -- several million per day.

For such datasets, use CSV, don't abuse indexed formats. (Unless you store it for a long time and actually use the index for spatial search, multiple times.)

kylebarron · on March 27, 2024

Your issue is that you're using the default (old) binding to GDAL, based on Fiona [0].

You need to use pyogrio [1], its vectorized counterpart, instead. Make sure you use `engine="pyogrio"` when calling `to_file` [2]. Fiona does a loop in Python, while pyogrio is exclusively compiled. So pyogrio is usually about 10-15x faster than fiona. Soon, in pyogrio version 0.8, it will be another ~2-4x faster than pyogrio is now [3].

[0]: https://github.com/Toblerity/Fiona

[1]: https://github.com/geopandas/pyogrio

[2]: https://geopandas.org/en/stable/docs/reference/api/geopandas...

[3]: https://github.com/geopandas/pyogrio/pull/346

culebron21 · on March 27, 2024

CSV is still faster than geo-formats with pyogrio. From what I saw, it writes most of the file quickly, then spends a lot of time, I think, building the index.

        > %%timeit -n 1
        > d.to_csv('/tmp/test.csv')
        10.8 s ± 1.05 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

        > %%timeit -n 1
        > d2.to_file('/tmp/test.gpkg', engine='pyogrio')
        1min 15s ± 5.96 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

        > %%timeit -n 1
        > d.to_csv('/tmp/test.csv.gz')
        35.3 s ± 1.37 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

        > %%timeit -n 1
        > d2.to_file('/tmp/test.fgb', driver='FlatGeobuf', engine='pyogrio')
        19.9 s ± 512 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

        > ls -lah /tmp/test*
        -rw-rw-r-- 1 culebron culebron 228M мар 27 11:02 /tmp/test.csv
        -rw-rw-r-- 1 culebron culebron  63M мар 27 11:27 /tmp/test.csv.gz
        -rw-rw-r-- 1 culebron culebron 545M мар 27 11:52 /tmp/test.fgb
        -rw-r--r-- 1 culebron culebron 423M мар 27 11:14 /tmp/test.gpkg

culebron21 · on March 27, 2024

Still CSV is 2x smaller than GPKG with this kind of data. And CSV.gz is 7x smaller.

kylebarron · on March 27, 2024

That's why I'm working on the GeoParquet spec [0]! It gives you both compression-by-default and super fast reads and writes! So it's usually as small as gzipped CSV, if not smaller, while being faster to read and write than GeoPackage.

Try using `GeoDataFrame.to_parquet` and `GeoPandas.read_parquet`

[0]: https://github.com/opengeospatial/geoparquet

culebron21 · on March 27, 2024

...but this has spared me today some irritation at work. Thanks!