Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my experience, in web apps built on top of ORMs there is often a TON of low hanging fruit for query optimization when database load becomes an issue. Beyond the basics of "do we have N+1 issues", ORMs sometimes just don't generate optimal queries. I wouldn't want to built a complex production web app without an ORM, but being able to eject from it sometimes is key.

Profile real world queries being run in production that use the most resources. Take a look at them. Get a sense of the shape of the tables that they're running against. Sometimes the ORM will be using a join where you actually want a subquery. Sometimes the opposite. Sometimes you'll want to aggregate some results beforehand, or adjust the WHERE conditions in a complex join. I've seen situations where a semi-frequent ORM-generated query was murdering the DB, taking 20+ seconds to run, and with a few minor tweaks it would run in less than a second.



I'm working on something right now with the Python ORM SQLAlchemy. It turns out that getting it to use RETURNING with INSERT is not trivial and requires you to set the non-obvious option `expire_on_commit=False`, which doesn't guarantee use of RETURNING, but is supposed to use it if your db driver and database happen to support it and the ORM happens to support it for that particular combination of driver and database. And there's no API to actually inspect the generated SQL even though it's emitted in the logs, so there's no way to enforce the use of RETURNING in your test suite without capturing and scraping your own logs (which fortunately is very easy within the Pytest framework).

I like ORMs but this is just frustratingly complicated on so many levels. I also understand that SQLAlchemy is an enormous library and not everything will be easy. But I think this case exemplifies the trade-offs involved with using an ORM.

(Yes I am aware that using insert() itself in Core does what I want, I'm talking about .add()-ing an ORM object to an AsyncSession).


I don’t mean this as a slight on SQLAlchemy - it does a lot of things very well and provides a ton of levers and knobs - but it’s absolutely the first place I look when I’m looking for performance improvements.

A friend used to say Zookeeper was where the crazy lived in any application that used it - sqlalchemy is where the slow lives in any application that uses it.


not sure this is that specific to sqlalchemy, you could say this really about any ORM


There is certainly an API to inspect your query, you can just call print() on the object iirc.


The problem is with using `session.add(obj)` instead of `session.scalars(insert(TheClass).returning(TheClass), data)`. If there's a way to get generated SQL from an AsyncSession, please do let me know.


I think you have a couple options [0], I currently have a project that uses 'echo' in debug mode.

[0]: https://stackoverflow.com/questions/27748053/how-to-log-sql-...


Yep, that's the same output that's emitted in the sqlalchemy.engine logger. If you use the logger you can use Pytest caplog to capture it.


This is a very blunt tool, but `engine.echo = True` prints all SQL going to the DB.

(I don't have any experience with AsyncSession, so cannot contribute something more specific)


A big problem with ORM's is that object orientation is just not a good way to organise software nor data for most domains.

Most business logic would be better expressed in the language of relational algebra (plus some extensions) than via OOP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: