Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is always more easier to scale horizontally at application layer (just adding more servers) than at database layer (which involves syncing data between multiple database instances).


In my experience more often than not, Postgres performance problems aren't really caused by the database, but either badly designed schemas or queries. For a lot of developers, the thinking goes that 10s of millions of rows sounds like a lot like big data, so they must start building microservices, distributed systems, use K/V stores and horizontally scale a la Google, whereas their entire dataset could actually fit in RAM of a single server and they could avoid the majority of the complexity if they just scaled vertically, which is usually much, much easier.


I think many people underestimate the capabilities of SQL databases by a couple orders of magnitude. I once worked on a feature that integrated tightly with a third party service. Their api didn't have any support for aggregate queries, and my company was smaller without real BI or data team, so I ended up writing a tool to dump our account data into a local Postgres db in order to do a some data analysis. By the time I left the company that db was approaching 50 GB, the table holding the primary data had about 40 million rows, and a couple of the supporting tables were over 100 million rows. This was all on a 2018-era Dell dev laptop - a fairly powerful machine (6 core/12 thread, 32 GB RAM, SSD), but certainly no server. It took about 90 seconds to update the materialized views that summarized the data I looked at most frequently. More than acceptable for my use case, and there was a lot of room for improvement in that schema (it was pretty much a straight dump of the api data).


Came here to say exactly this. Over the last 12~ years working with PostgreSQL I've dealt with quite a few performance related issues - almost all were poorly written queries.


Can you point to some good resources on how to write better postgres queries? Or give examples of common pitfalls?


Start with EXPLAIN ANALYZE then work from there. You can use tools where you paste the output of it and it shows you the data in a more easy to understand format.


I am using https://explain.dalibo.com/ for that exact purpose and it does a great job highlighting the perfs issues.


I don't have any other good recommendations other than learning how to read and use EXPLAIN [1]. This should typically be the first tool when you have a slow query.

[1] https://www.postgresql.org/docs/current/sql-explain.html


An old classic but too many indices can be harmful too.



I would guess the most common pitfall is either not having indices or having the wrong kind of index for your query.


100%. Exaggerating the bigness of their own data is a common phenomena. Sometimes one is talking to a group of developers who are all so impressed with this bigness but every one of them has a phone in their pocket which could fit their entire dataset.


You can horizontally scale the database directly.


Depends on the scenario




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: