Indeed! It would be so much better if this were a Postgres extension instead. Th...

benesch · on May 14, 2024

> It would be so much better if this were a Postgres extension instead.

I've thought about this counterfactual a lot. (I'm a big part of the reason that Materialize was not built as a PostgreSQL extension.) There are two major technical reasons that we decided to build Materialize as a standalone product:

1. Determinism. For IVM to be correct, computations must be strictly deterministic. PostgreSQL is full of nondeterministic functions: things like random(), get_random_uuid(), pg_cancel_backend(), etc. You can see the whole list with `SELECT * FROM pg_proc WHERE provolatile <> 'i'`. And that's just scratching the surface. Query execution makes a number of arbitrary decisions (e.g., ordering or not) that can cause nondeterminism in results. Building an IVM extension within PostgreSQL would require hunting down every one of these nondeterministic moments and forcing determinism on them—a very long game of whack a mole.

2. Scale. PostgreSQL is fundamentally a single node system. But much of the reason you need to reach for Materialize is because your computation is exceeding the limit of what a single machine can handle. If Materialize were a PostgreSQL extension, IVM would be competing for resources (CPU, memory, disk, network) with the main OLTP engine. But since Materialize is a standalone system, you get to offload all that expensive IVM work to a dedicated cluster of machines, leaving your main PostgreSQL server free to spend all of its cycles on what it's uniquely good at: transaction concurrency control.

So while the decision to build Materialize as a separate system means there's a bit more friction to getting started, it also means that you don't need to have a plan for what happens when you exceed the limits of a single machine. You just scale up your Materialize cluster to distribute your workload across multiple machines.

One cool thing we're investigating is exposing Materialize via a PostgreSQL foreign data wrapper [0]. Your ops/data teams would still be managing two separate systems, but downstream consumers could be entirely oblivious to the existence of Materialize—they'd just query tables/views in PostgreSQL like normal, and some of those would be transparently served by Materialize under the hood.

[0]: https://www.postgresql.org/docs/current/postgres-fdw.html