Hacker Newsnew | past | comments | ask | show | jobs | submit | sbma44's commentslogin

hey, longtime Mapbox employee here. I appreciate all the work you're doing here to help people wrap their heads around vector tiles! This is an old technology at this point, and as you've explained, there are robust tools for moving from GeoJSON to tilesets. It's cool to pull apart the nuts and bolts of a thing (and the Mapbox Vector Tile Spec is open) but there are easier ways to accomplish this objective.

A question for you:

> Obviously they'd never use MVT in that use case, so they didn't bother supporting it.

What does this mean? Mapbox GL (JS and native) both support MVT, of course--that's why we created it! Perhaps you were referring to something else? Higher in this post I see a reference to "GeoJSON vector tiles" and I'm curious what that means. GeoJSON is very convenient (Mapbox helped support its enshrinement as IETF RFC 7496), but one of the hard parts of tiling is slicing features and knowing when to simplify them. Once you've done that, you might as well cram it into a protobuf or other highly efficient container. When you pass Mapbox GL a GeoJSON, it actually cuts the geometry into tiles in memory and uses those for rendering.

Some other general notes: - the process of tiling is lossy (or should be). if you zoom out to see all of north america, your GeoJSON of neighborhood address points is going to overlap. you should drop most of them from the low-zoomlevel tiles. Tippecanoe does this in brilliant and highly tuneable ways. This applies to geometry simplification, too. Folks should keep this in mind when doing size comparisons. - map tiling is fundamentally about moving compute into preprocessing and assembling geometry from highly parallelized fetches. MVT is a technology built on and for S3-like services. it's been exciting to see new approaches to this problem that offer lovely ergonomics around deployment etc, but if you have cloud infra, a hot cache, and are optimizing for performance, MVT remains hard to beat - we continue to research additional optimizations for VT, but the technology has stood the test of time, and proven useful in many different contexts beyond map rendering, including routing and geocoding


>What does this mean?

Ugh, dumb typo - it was late. I meant "Obviously they'd never use GeoJSON in that use case".

> Higher in this post I see a reference to "GeoJSON vector tiles" and I'm curious what that means.

It's what it sounds like: vector tiles, but instead of protobuf, the data is simply passed directly as GeoJSON. Really convenient for a use case like a server that generates data on demand: easy to generate (ie, it avoids all the difficulty of OP's post), easy to inspect in the browser for debugging. Only downside is it's less efficient space-wise than protobuf. So it's useful as a first step for a proof of concept (to be replaced by MVT), or in a case where the size doesn't matter.

>Once you've done that, you might as well cram it into a protobuf or other highly efficient container.

I'm disputing the "you might as well" bit for many use cases. :) (Again, I think Mapbox is very geared towards large scale uses, but a lot of the internet is small and bespoke).

It was actually Tangram, not OpenLayers, that I was thinking of that supports it: https://github.com/tangrams/tangram?tab=readme-ov-file#vecto...

>MVT is a technology built on and for S3-like services.

It's interesting that you say that. My experience, having been down this road a few times, is that serving MVT from S3 is generally a pain that I don't recommend for new clients. It takes some pretty specific AWS configuration, and the process of uploading thousands of individual files is slow and error-prone. (I wrote a guide on it once but can't find it now).

Yeah it's a good solution for large-scale uses (again...) but not good for the average user.

PMTiles seems like a pretty compelling alternative for those scenarios: ship one file instead of thousands, and rely on HTTP range requests. The downside I ran into is that not all "S3-like services" support that.

In practice, I recommend either hosting data on Mapbox/MapTiler/whoever is cheapest this month if the volumes are low, or setting up a tiny tile server. Even a tiny server is sufficient for serving tiles, and costs a fraction of what Mapbox charges (especially since Mapbox's change to charging per square kilometre, which is absolutely cost prohibitive for sparse data).

>we continue to research additional optimizations for VT,

Can you elaborate? The spec (https://github.com/mapbox/vector-tile-spec) has not had an update in 4 years, and since MVT v2 did not include any substantive changes, the spec is essentially unchanged since 2014. In 2018, a working group for a version 3 was announced (https://github.com/mapbox/vector-tile-spec/issues/103) but then apparently quietly abandoned only a couple of months later.


Didn't mean to imply that tiling is trivial--our initial business model was focused on taking care of that difficulty for our customers, after all, and it wouldn't have made sense if we didn't think we were delivering value.

I will defer to your experience re the utility of tiled-but-still-GeoJSON as a sensible middle ground. I think you're right that we haven't seen this as an area that merits significant attention--it's sort of "not worth optimizing yet [geojson]" or "worth optimizing [MVT]". But I can see how there could be middle grounds in some scenarios.

PMtiles is what I had in mind when I mentioned ergonomics. Brandon's delivered a very compelling approach, as I hope I conveyed to him at CNG earlier this year. The lack of fully specified behavior re range requests is a lingering concern, as you acknowledge, and there are some other areas like incremental updates where having a huge mess of tiles still looks pretty good. But I think it's fair to say that it's overperformed as a solution, and I understand why people are excited about it and the other cloud-native geo formats that have emerged in recent years. Decades ago, Mapbox founders were at DevSeed writing PHP--there will always be some sympathy around here for "just upload it" as a deploy story!

I can't talk about the optimizations we are investigating, but I can at least acknowledge some of what makes the problem so hard (and the update schedule so slow): MVT is quite good, and backward compatibility is a pain, especially when you're optimizing for bundle/binary size (a particularly hard problem when your competitors get to preinstall their libraries on every mobile phone in the world) and working with a customer base that's preintegrated, in every way and to every extent imaginable, with an existing ecosystem. There is a reason people still use shapefiles! Though I hope MVT's reputation remains a bit better than that...


>There is a reason people still use shapefiles!

It's weird, I've done an absolute ton of work with random geospatial data from all kinds of sources (see opentrees.org), but when someone asks what format to supply data in, I often suggest Shapefile. There's a kind of rugged simplicity to it, like an old Nokia. It rarely goes wrong in strange ways, everything supports it, and its limitations (especially file size, field name length/casing, geometry mixing etc) tend not to be show-stoppers.

GeoPackage turned into such a complex beast that you have no idea what's going to be inside, I tend to avoid it at all costs.


original demo & internal engineering blog post for the popular vector mapping library


In general the OSM community errs on the side of vetoing new initiatives -- doing an import properly means raising it on the mailing lists, which invariably attracts vastly more criticism than assistance. Even coordinated (non-automated) remote mapping attracts considerable criticism these days -- most ludicrously, by people suggesting that it's better to leave an area unmapped so that it might one day attract local mappers (rather than remote contributors who will work on it immediately).

There's also the question of the license. Without opening a can of worms about the usefulness of sharealike provisions in general, I think it's safe to say that making a geocoding result trigger sharealike implications in a database is clearly problematic (consider geocoding a database of customer addresses, then being obliged to share the rest of the table!). Unfortunately OSM hasn't yet reached agreement on a geocoding guidance. Consequently a couple dozen of us working on OpenAddresses have gotten the project to over 200 million addresses in less than 2 years. OSM is now a decade old, has millions of registered mappers, and contains less than 60 million addresses.

I don't mean to be all doom and gloom, though. I would love to improve OSM as a home for address data. And I urge those of you who care about this incredibly important resource to join me -- hop on the talk and legal-talk lists and help make the case for a geocoding guidance that makes sense.


> In general the OSM community errs on the side of vetoing new initiatives

"New initiatives" are fine. "new bulk data imports" are a different thing. There are many social and technical problems with importing data. De-duplicating data is hard.

OSM, unlike OpenAddresses, wants to have one licence for all the data, rather than lots of little licences for each different region. OSM also (tries) to have one hierachial, address data format for the whole world, rather than a collection of different formats for each region.

> OSM is now a decade old, has millions of registered mappers, and contains less than 60 million addresses.

OSM is more than just addresses.


I'm the guy who imported the Queensland and Victoria addresses, and I've been in touch with members of the government in Canberra who are leading out this effort. The new release should supersede the existing Australian datasources in OpenAddresses.io; my expectation is that we'll deprecate them in favor of this one.


PSMA publish GNAF every three months. Need to think about a production process and metadata :-). There are about 100k "new" addresses each time and a lot of addresses improve their location. Might be worth asking for a change file.


Nice work with OpenAddress.io BTW.


>Repeat until you have some useful output.

Do you really think that's how useful work is accomplished?

The point of TFA is that there aren't as many problems in need of brute forcing as we'd like to believe. In truth, most important problems require intelligent and skillful analysis. Where computing power is truly the bound, it tends to be applied pretty quickly.


Sure, but that's much less fun. (and this router is currently only $60 shipped from Newegg -- that, plus maybe $10 worth of Radioshack components).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: