In my experience, in web apps built on top of ORMs there is often a TON of low hanging fruit for query optimization when database load becomes an issue. Beyond the basics of "do we have N+1 issues", ORMs sometimes just don't generate optimal queries. I wouldn't want to built a complex production web app without an ORM, but being able to eject from it sometimes is key.
Profile real world queries being run in production that use the most resources. Take a look at them. Get a sense of the shape of the tables that they're running against. Sometimes the ORM will be using a join where you actually want a subquery. Sometimes the opposite. Sometimes you'll want to aggregate some results beforehand, or adjust the WHERE conditions in a complex join. I've seen situations where a semi-frequent ORM-generated query was murdering the DB, taking 20+ seconds to run, and with a few minor tweaks it would run in less than a second.
I'm working on something right now with the Python ORM SQLAlchemy. It turns out that getting it to use RETURNING with INSERT is not trivial and requires you to set the non-obvious option `expire_on_commit=False`, which doesn't guarantee use of RETURNING, but is supposed to use it if your db driver and database happen to support it and the ORM happens to support it for that particular combination of driver and database. And there's no API to actually inspect the generated SQL even though it's emitted in the logs, so there's no way to enforce the use of RETURNING in your test suite without capturing and scraping your own logs (which fortunately is very easy within the Pytest framework).
I like ORMs but this is just frustratingly complicated on so many levels. I also understand that SQLAlchemy is an enormous library and not everything will be easy. But I think this case exemplifies the trade-offs involved with using an ORM.
(Yes I am aware that using insert() itself in Core does what I want, I'm talking about .add()-ing an ORM object to an AsyncSession).
I don’t mean this as a slight on SQLAlchemy - it does a lot of things very well and provides a ton of levers and knobs - but it’s absolutely the first place I look when I’m looking for performance improvements.
A friend used to say Zookeeper was where the crazy lived in any application that used it - sqlalchemy is where the slow lives in any application that uses it.
The problem is with using `session.add(obj)` instead of `session.scalars(insert(TheClass).returning(TheClass), data)`. If there's a way to get generated SQL from an AsyncSession, please do let me know.
Profile real world queries being run in production that use the most resources. Take a look at them. Get a sense of the shape of the tables that they're running against. Sometimes the ORM will be using a join where you actually want a subquery. Sometimes the opposite. Sometimes you'll want to aggregate some results beforehand, or adjust the WHERE conditions in a complex join. I've seen situations where a semi-frequent ORM-generated query was murdering the DB, taking 20+ seconds to run, and with a few minor tweaks it would run in less than a second.