Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Python SQLAlchemy works a bit like this, if you use it that way:

    query = session.query(Products)

    if limit:
        query = query.limit(limit_val)

    if offset:
        query = query.offset(offset_val)
Sort is a bit tricky, but as I'm a noob at SQL Alchemy, I had some not-so-pretty boilerplate code implemented for decorating queries with custom sorts. But essentially it boiled down to:

    if sort_type == SORT_REVERSE_CONST:
         sort_obj = sort_obj.desc()
    else:
         sort_obj = sort_obj.asc()

    query = query.order_by(sort_obj)
And if you want to search through dynamic tables, change the initial query instantiation to something along these lines:

    query = session.query(retrieve_db_object(table_name))
And to execute:

    return query.all()


Python SQLAlchemy works a bit like this

All ORMs work "a bit like this". I don't want "a bit like this". I want exactly like this.

Because with my proposed interface I could be productive immediately and permanently. I would never have to refer to any documentation. Not once.

With every other ORM this is a pipe dream. Their "fancy" chainable wrapper-APIs are nothing but a ball on a chain.


Well, no, they're more than a "ball on a chain" - they're a tradeoff - like everything in software.

They give you the ability to manipulate the query in interesting ways at any point before you execute it. They let you join different queries together, built by different parts of the system, in a safe way. They let you work with the native language you're working in instead of having to construct clauses in a foreign language, using strings. They let you post optimise your loading strategies.

You might want "exactly like that", but it's not going to be as flexible as a chainable wrapper system.

Hell, I could probably give you pretty close to that in python with a little work, but it's not something I'd want to use myself.

I get it. I like working in SQL too. I know it really really well, and I cringe when I see developers writing totally sub-optimal code because they don't understand the relational data model.

But there are other ways to do things, and what you're describing doesn't give you much more than raw SQL, so why not just use raw SQL? You've added some syntactic sugar that, in my language (Python), would be a bit of a horror show (strings access local variables implicitly, no thanks). What else do you gain?

This is more verbose, granted:

    q = session.query(Products)
    if title:
        q = q.filter(Products.title.like(title))
    if price_range:
        q = q.filter(Products.price.lte(price_range))
    if offset:
        q = q.offset(offset)
    if limit:
        q = q.limit(limit)
    products = q.all()
But then you get more stuff for free, like drilling down into the other tables:

    p = products[0]
    p.supplier.contracts[0]
But that's rubbish, because you'll be loading in a really inefficient way. That's ok though, tell the system how you're going to want to load the additional data.

    q = q.options(
        joinedload('supplier').
        subqueryload('contracts')
    )
See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

Use whatever makes sense for your use-case. Don't limit yourself because you'd have to learn something new. Honestly, before using SQLAlchemy I mostly felt the same as you do. Many ORMs get in the way, but that's not really a problem intrinsic to ORMs.


>See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

If you are fine with the new debugging this entails. Take the joined load example you gave. Almost every time I encounter SQLAlchemy code, the author didn't understand the lazy joining semantics and every attribute access on each record is emitting a new query and you now have O(n) queries. It's not obvious that is happening until it bites you once you have thousands of records.

For another example, I can count on one hand the number of developers I have encountered that can understand the expiration of the objects in the session or what the engine builder stuff is doing in the boilerplate they paste in to get it to work. It always requires experimentation, a trip back to the SQLAlchemy docs, and then finally logging all SQL queries to see if it's finally emitting the optimal queries with the transaction guarantees you are looking for.

My gripe about SQLAlchemy is that it separates you too much from what is happening underneath.


Yup. I'll give you most of that - there's a pretty big learning curve with SQLAlchemy.

But, taking the lazy joining example. Someone didn't understand how it worked and now you have to fix it, which is probably as simple as changing that loading strategy to something more suitable for the way you're using the data.

Let's rewind to the late 90s. If I wanted two tables of data to write into the page in an interlaced fashion I would look at how wide they were. I'd make a decision about using two queries or using one larger join. The two query approach was sometimes required due to the size of the data, but it meant cryptic output logic to track positions of cursors relative to each other. The single joined query was simpler to deal with, but still required tracking a last_parent_id so you could swap during the interlacing.

Other developers (those same ones that didn't understand lazy joining) would loop over the first query, running extra queries in the loop (I saw this a lot). Same bad performance as the lazy join.

When you discovered this issue in the code it was a total pain to fix. You're talking about rewriting whole load of code to the point of being unrecognisable from the code you started with.

Contrast that with how easy it is to fix in the SQLAlchemy case. I mostly don't worry about loading strategies now until I'm deep in development. Something's running a bit slow, take a look at some logs, tweak a couple of things and you're golden again. That's such a powerful abstraction.

Regarding the config of the engines etc, again, it's something you need to learn. But really, someone's starting an application from scratch and they just want to dump some code in to handle all their db interactions, but they don't want to know how it works? That's on them, either learn it or use the sensible defaults (in, for example, flask-sqlalchemy).

SQLAlchemy ORM separates you from what's underneath, but it's an ORM, that's kind of the point. If you need to be closer to the metal, use SQLAlchemy Core.


Good news! Almost every language offers some sort of string concatenation with no protection whatsoever against injection, which seems to be what you're asking for.


His proposal is not string concatenation, and it is not vulnerable to injection.


"WHERE price <= #{price_range}" looks like raw interpolation to me. How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?


It's pseudocode for an implementation that doesn't exist. There is nothing in his proposal that requires this hypothetical implementation to copy the value of the price_range variable into the string at all, much less unescaped.

The strings don't even have to be sent to the DB at all. You've utterly missed his point. He wants the language to be intelligent about what the SQL means and do the right thing.

It's not raw SQL, it's an abstraction.


The psuedocode in question happens to be syntactically valid Ruby. If the goal was to demonstrate how simple query generation could be without abstraction layers like Sequel, it is a valid criticism to note that the example given is eliding a necessary feature. Especially since helping avoid injections is one of the primary advantages of such an abstraction layer.

The original argument was that if one could "just write SQL" one wouldn't need to refer to documentation to become productive, "not once". I don't personally think it's practical to use any database library without reading the documentation, but certainly the moment you get away from "the SQL I type in goes straight to the database" you're going to need documentation to tell you what's going to change.

What's being described here sounds like a library that parses SQL fragments, combines them into whole queries, and sends them off to the database. To implement this this without resorting to simple string concatenation needs an intermediate representation of SQL language components. That intermediate representation is going to look a lot like SQLAlchemy, or Sequel, or Arel. That means that instead of "just writing SQL" you're actually adding an extra layer of abstraction and another chunk of documentation to read.

All that said, the "just use SQL" approach is already fairly well served. Most languages have mature bindings to specific databases as well as at least one ODBC-style API to simplify access to those bindings. Nearly all of these have support for client-prepared statements, which handles the interpolation problem. If you want to build more complex queries on the fly, you can use string concatenation or CTEs or decide to use a SQLAlchemy/Sequel/Arel-type library for just that one case.

And, honestly, you don't need to give up SQL to make use of Sequel. I personally really dislike the query composition API demonstrated at the top of this thread, but you can do useful things with Sequel without it:

    db = Sequel.sqlite
    db.execute "create table foos (id int not null, name text)"
    db.execute "insert into foos (id, name) values (1, 'bar')"
    # or
    db[:foos].insert(id: 2, name: "baz")
    db["select * from foos where id = ?", 1].first # ==> {:id=>1, :name=>"bar"}
    # or
    db[:foos]["id = ?", 2] # ==> {:id=>2, :name=>"baz"}
No need to go any further than raw SQL, but increasing levels of abstraction available if you feel like it.


> The psuedocode in question happens to be syntactically valid Ruby.

So is everything else anyone types, code-like or not. He even said "Note how I deliberately shuffled the order and didn't bother with escaping.".

The response was flippant, intelligence-insulting, and obviously the result of failing to read thoroughly.

And speaking of intelligence-insulting, we all know you can run raw SQL through Sequel.

You're not having a useful dialogue, you're being combative, like the person I initially replied to.


As everyone has repeatedly pointed out, once you deal with the obvious issues you quickly end up with something that looks like the gazillions of other ORMs on the market; sorry if I hurt anyone's feelings.


I said the code snippet looked like raw interpolation to me, and I asked how it could be made not vulnerable to injection. It was an honest observation and a genuine question. No flippancy was involved. You're free to think I'm an idiot, but you are the one being insulting and combative here.


How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?

It was just a mockup. But you are right, in reality it would end up looking more like this (and use custom interpolation for escaping):

   s.add "WHERE foo > $(bar)"
Likewise a smart syntax for clause combining (AND/OR) and some kind of nesting would probably be needed.

I believe both of these problems should be solvable without compromising the simplicity of the approach.


You still need to supply the local variables to your ORM-like so that it can perform interpolation into the string (a feature which is actually best done by the database driver, NOT your ORM-like). Perhaps we can suggest this syntax:

s.add "WHERE foo > ?", bar

Or even this:

s.Where "foo > ?", bar

At that point, you've reinvented ActiveRecord or hundreds of other query builders, it also avoids you having to remember the ordering rules of sql as to which part of the query must be built first. There's a reason these query builders have converged on similar syntax.

I see where you're coming from - you don't want to learn two sets of syntax, but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL while avoiding sqli and statement order issues, they also usually have an sql() option which lets you just send SQL if you wish for a complex query.

There are very good reasons people keep inventing layers over SQL:

They help avoid sqli

They centralise db access so you can optimise in one place by say adding caching layers to queries

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to


"You still need to supply the local variables to your ORM"

That can be done automatically in many languages. At least in the sense you can do:

"where value=$(bar)"

and not:

"where value = ?", bar

OpenACS was an old web framework for the Tcl language, but it even had that 20 years ago!. The company backing it failed in the .com era, and the lack of types was a turnoff for many - but it made SQL much much more readable. And of course the queries were turned into prepared statements. A query looked like this:

select foo from table where age>:min_age and name != :user_name

Things like this can also be done in node.js (disclaimer: I wrote the blog post below):

http://ivc.com/blog/better-sql-strings-in-node-js/


Perhaps we can suggest this syntax: s.add "WHERE foo > ?", bar

I'd say stick with keeping it a string and use:

  s.add "WHERE foo > $(bar)"
but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL

And that's the problem. Everyone "echos" SQL, just slightly differently, each with their own idiosyncraticies.

If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

It's much easier for humans to memorize a slight SQL dialect than a different god-awful, nested ORM API boilerplate for every language they're working with.

I can write this without thinking, for queries of almost any complexity:

  s.add "SELECT id from users WHERE"
  s.add "name == $(name)" unless name.nil?
  s.add "AND age > $(age)" unless age.nil?
  s.add "GROUP BY age"
A parser/query-builder doesn't need amazing smarts to infer exactly what I want from this sloppy syntax; e.g. drop the "AND" if the first condition was false and present me with meaningful error messages when something is ambiguous.

Ask me to formulate the same query in any ORM and I'll have to refer to the documentation. And almost every time I'll waste a stupendous amount of time on getting some irrelevant, syntactic detail just right. Only because the ORM insists on an autistic API.

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to

After having worked with Hibernate, SQLAlchemy (the one-eyed amongst the blind), ActiveRecord, DataMapper, Sequel and a few others that I don't even remember, I conclude with passion: They have it backwards.

It's not the vagaries of SQL that I don't want to deal with. It's the vagaries of the ORM du jour that I don't want to deal with.


If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

They usually do more. SQL is not very pretty or friendly IMO, ORMs clean it up a bit (for example allowing arbitrary ordering, sqli, construction of multiple queries from one base). The example I gave was an arel query - very similar to yours if you move the WHERE out into the function name.

Your ORM proposal above looks fine (and very similar to existing ones but with more strings), but personally I find it uglier than the many which are available, and you haven't actually tackled the protection against sqli or explained how that would work. Normally you'd pass that through to the db driver as a $n param, rather than trying to escape a string at the ORM level. If your ORM is going to infer details from the sql, it basically has to parse that sql first and handle all the same things that current ORMs do by using named functions to add SQL - you'd find the same issues with consistency and edge cases I imagine, and of course have to be very careful about parameters and sqli.

So if you don't like ORMs, don't use them, but you do have to be sure that your own ad-hoc home-grown ORM that you're writing instead covers all the bases current ones do. I prefer the syntax of the ones you're denigrating for simple queries, even if for complex queries it might be better to write sql directly (via the functions they provide), but would be interested to see it in action - when are you going to start :)


Or you could just use Sequel and not have to build up your own library.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: