Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Meet Korma, a SQL DSL for Clojure (sqlkorma.com)
173 points by ibdknox on Nov 2, 2011 | hide | past | favorite | 44 comments


I am attracted to these types of DSLs but . . .

I work with a number of legacy data sources that often have models that are not ORM-friendly.

I fall into SQL code patterns that seem left behind by tools like this on a regular basis, particularly using SQL functions without default grouping because I'll select other stuff higher up.

A simple example:

  select
    col1,
    col2,
    some_function() over (partition by col1,col2 order by col1,col2) as some_ck
  where <some conditions>
This is often part of a larger query (e.g., this is embedded in bowels of a some longer "chain" of SQL where I will be using some_ck to decide on which tuples I want in the end).

I get pretty significant mileage out of this - meaning I write much less other code around it to handle the data.

I guess my complaint is that I often find the SQL DSLs lacking in expressiveness that I use regularly enough to miss.

Of course, this could be because I've spent too much time hacking around in legacy data in "real soon now" mode rather than in "hmm . . . let's clean up this data model" mode.


I think writing DSLs for SQL makes little sense.

SQL is already a declarative query DSL, why not just use it? Putting another declarative layer on top buys you little (syntactic sugar for most libraries, this one additionally seems to deal with one-to-many relations, and I hope that support is read-write, not just read-only) — and isolates you from the database below.

In every serious application I've worked on there was a need to use something which the DSL did not support. So why do it at all?


Raw SQL is not very composable, while Korma emphasizes composability on the second paragraph of the page.


And composability lets you create business DSLs and macros on top of that. I've used this to good effect with Rails' Arel, but since it is still Ruby, you can't go as far unless you're willing to parse the Ruby into something more legible, like S-expressions.


It does make sense to be able to define business/application logic as reusable query components.

For example, if your PERSON entity has TITLE, FIRST_NAME and LAST_NAME fields, that little bit of logic that concatenates them together in the right way to form a full name is something that is useful to define. Likewise, setting up a join of tables that is used many times in your application is something that Korma will let you do.

Even though these components may be implementable on the database side (as user defined functions or views), it may not always be possible (or desirable) to do that way.


I spend much of my time in the database, legacy and greenfield, and I would love a better DSL than SQL, something that ran directly in the database itself and targeted the planner/optimizer rather than just emitting more SQL. I would like to be able to pass a WHERE clause to a function, or a SELECT statement to a procedure, not just simple ints, or strings to be exec'd.


In this case, an embedded DSL brings all the goodies of Lisp's code-as-data to SQL.


a SQL DSL and an ORM aren't really the same thing IMHO. ORMs are about linking objects in memory to database rows, whereas the SQL DSL is just about a thin layer over the rendering of a SQL string, not unlike how you could compose a DOM object of an XML or HTML document and get a string out of it.

I'm not a Clojure user but as I have some authority on this subject I can illustrate your query as SQLAlchemy:

    select([table.c.col1, table.c.col2, 
             func.some_function().
              over(
               partition_by=[table.c.col1, table.c.col2], 
               order_by=[table.c.col1,table.c.col2])
            .label('some_ck')])
         .where(<some condition>)


I guess I've been corrupted by working in a heavily database oriented environment, and further corrupted by the fact that it's Oracle, but for most of our own use cases, plain SQL is preferable to this. We'll probably have to run it ad hoc against the database anyway to figure out what it's doing and/or tune it. Many of our queries go into the hundreds of lines[1][2], so anything that obscures the underlying SQL itself tends to make life more difficult.

But perhaps it's just a matter of getting used to a different format.

[1] Legacy environment.

[2] Edited to add: most of our newer apps don't actually invoke huge queries from the app layer. We now tend to layer a view or pl/sql table function on top of the hairy query, so that at the app level it's mostly "select * from blah." For that simple a query, a DSL like this is probably fine, albeit, unnecessary.


I wrote SQLAlchemy after spending about eight years in three all-Oracle jobs one after the next, and I wrote it specifically around the use case of being able to compose hundreds-of-lines queries from smaller pieces.


Can SQLAlchemy core do arbitrary, non-ANSI things, like create a Postgres user-defined type or a SQL function?

And if so, can I use SQLAlchemy as a standalone pre-processor? My app's in Rails so I can't exactly go integrate it..


> Can SQLAlchemy core do arbitrary, non-ANSI things

yes. many DB specific constructs are included and you can make your own: http://www.sqlalchemy.org/docs/core/compiler.html

philosophically SQLA does not consider "abstraction" to be "hiding", so ultimately no features of the backend store are concealed

> like create a Postgres user-defined type

sure, the ENUM type does this right now automatically (i.e. CREATE TYPE...AS ENUM). For other kinds of PG types, you can emit the CREATE TYPE yourself then link to it with a SQLAlchemy-side custom type. You can integrate those two steps also.

> or a SQL function?

Again a SQL function you're going to be using PG's PL/pgSQL or other embedded language to create it, so you roll the CREATE FUNCTION yourself, then call it on the Python side using func: http://www.sqlalchemy.org/docs/core/tutorial.html#functions


Fair enough. I'll check it out.


SQL that is hundreds of lines long can be hard to follow and harder to maintain. SQL is probably the fastest path to execution but perhaps not the most testable and maintainable. Perhaps that's the lens to view it through.


Dear author of Clojure library, please always put a Leiningen dependency string in your home page ([korma "0.2.0"] in this case).

Apart from that, I like what I see, especially how it use c3p0 for connection pooling by default.


Sorry :) It's on the readme for the source. I'll add it to the website once things calm down a bit.


Not saying this isn't a great, useful library.

But - what do people have against SQL? As far as I can see, it's a pretty good language for relational data manipulation. But lots of people act like actually writing SQL is dirty work. It's not. It's an elegant, expressive language, for what it does.

Of course, use something like this if you have a documentable need for the flexibility/composability. If you're generating lots of custom queries, it may be justified.

But in nine out of ten projects I've seen which use a RDBMS, the system could be made substantially simpler, more performant and more flexible by just writing raw SQL queries instead of screwing around with ORMs.


SQL is fantastic, but the annoying bit is that you have to create strings to send to the database, and generating these strings correctly is hard. One of the key features of libraries like Korma and SQLAlchemy is that they provide a very eloquent way of robustly creating these strings. Yes, things like prepared statements help, but dealing with strings and prepared statements is not relevant to the problem that you want to solve. Korma et al. shave the yaks for you.

Without something like Korma, your process looks something like:

structural-description-of-query --(string operations)--> query-string-to-db --(careful unpacking of response)--> structured-results

With Korma and its ilk, this process becomes:

structural-description-of-query --(somebody else's problem)--> structured-results


Expressive, yes. But SQL is rarely elegant. Declarative languages can be elegant; even modern regexen, with their named groups and whitespace tolerant syntax, allow the ideal of "small, composable methods". You can't do composability in SQL (in fact, if you Google, you'll discover that the very term "composable" is an unrelated feature of, IIRC, Sybase.)

The SQL community doesn't spend much time talking about writing elegant code in the way the functional, concurrent, OOP, and whatever-Ruby-is communities do. When programmers talk about "SQL Style" or "SQL Idioms", they mean whether you make keywords uppercase and where you indent and wrap to a new line. There is an entire book on the SQL equivalent of "K&R braces, or GNU style?", and should you or should you not alias long tables to short aliases.

There's a book on "SQL Refactoring"; it's about database design, not coding.

The pain goes on. I've been wishing for something like "CoffeeScript for SQL" for a while; I'm not a lisphead, but this may just have to do, especially if it supports PG-isms like SQL procedures and user-defined types.


SQL is not a bad language. However, it's far from flexible and reusable. The biggest issue I always have with SQL is the fact that I can hardly reuse any piece of my code.

Let me give you an example.

select vendor, model, avg(price) as price from allCars group by vendor, model;

So far, this is simple and clear. Now let's imagine someone has a requirement to calculate this for different set of cars:

select vendor, model, avg(price) as price from allCars where vendor = 'BMW' group by vendor, model;

or

select vendor, model, avg(price) as price from allCars where year > 1990 group by vendor, model;

This approach only works if I'm always selecting from the same table - allCars, and I know all the condition in advance. To reuse the actual code which calculates the average prices (select ... group by) I'd need something like this:

select vendor, model, avg(price) as price from $cars -- $cars is a variable with at least three attributes (vendor, model, price) group by vendor, model;

This way I could write my own relational operator which calculates avg prices of cars per vendor and a model no matter where the cars are coming from.

Does anyone else have the same problem as me or am I just not smart enough to figure out how to do it in SQL?


Korma was actually designed to be used to do exactly that: write SQL. You don't have to use any of the ORM-y features and you're essentially just writing SQL in a lisp syntax. The advantage, as others have said, is simply composability, which leads to reusable SQL parts.

It's up to you whether you use korma to "just write SQL" or you use the magic :)


Absolutely, and I completely grant you that there are projects where the composability is a huge win. The library looks great, by the way.

I just think a lot of people will have a kneejerk reaction to use something like this because they have a distaste for raw SQL, rather than a considered analysis of the pros and cons.

Because there are cons - every time you use something like this, however well designed, you introduce another layer of abstraction into your software, one that is bound to have some leaks. Maybe it's justified, maybe it isn't.

Just do the analysis to see if this can benefit you, instead of avoiding SQL by default just because it's SQL.


See also ClojureQL

http://clojureql.org/documentation.html

Which doesn't have as nice a website but may have superior semantics.


The issue I had with ClojureQL is that it seems like the wrong abstraction to me. Myself and others I've talked to have found ourselves fighting with how it tries to model data and more specifically the kind of queries it generates. For example, it is very quick to use rather inefficient sub-selects.


I'll be checking out korma. :)


My first impression with ClojureQL was pretty bad, because it didn't do parameterized SQL in the first public release (which fortunately have changed since they released 1.0.) I know it was only a beta, but first impression is strong. I'm now comfortable enough to use it in one of my production site, though.

korma seems to done this right.


Man this makes me want to use Clojure for something.


If that's all it takes, check out LINQ-to-SQL in C#- it does similar stuff. I could take or leave much of C#, but I miss LINQ every time I don't have access to it.


Totally agree. LINQ to SQL is the one feature from C# I wish I had in many other languages. The power you get from the chaining and lazy loading makes implementing many complex joins and the like very easy, testable and compile time checked.


the thing is, to implement something like linq in C# or any language requires compiler changes, with clojure or any lisp, this is library-level. You can't make korma or linq in something like java, it's too conservative. Macros!


Nice looking DSL, but .. how could I do things like SELECT TOP 10 (MSSQL) or SELECT DISTINCT (just trying to use it for queries I need at work) ? Haven't found any clues on this in the docs (nice site BTW).

Also, it would be nice to allow some raw SQL in where clause - so one won't have to wrestle with DSL just to occasionally use proprietary features or db-specific functions.


I love the name! Its a spicy south asian curry https://www.google.com/search?gcx=w&q=korma&um=1&...


you mean mild English curry?


the color scheme is awesome too


People liked this may also like Rogue http://engineering.foursquare.com/2011/01/21/rogue-a-type-sa... which is Scala DSL for MongoDB. It is not a SQL database but can be compared in terms of semantics.


There seems to something similar for Ruby in terms of C# linq https://github.com/suryagaddipati/linqr


Back in the day, Ambition was pretty cool:

http://defunkt.io/ambition/

It used ParseTree to parse idiomatic Ruby code against Enumerable APIs into SQL.


I wrote an Ambition-like API for DataMapper a couple of years ago and still keep it updated: https://github.com/dkubb/dm-ambition

It uses sourcify (which uses ParseTree on 1.8 and Ripper on 1.9). The semantics are identical to standard Enumerable API. In fact, it's not recommended, but you can turn off dm-ambition and the expressions will still work as-is -- only instead of generating a query from the block it iterates over the results and filter in-memory.


Great stuff! I haven't really kept up with the Ruby world, and didn't realise there was a solution on 1.9.


Well, on 1.9 it's not quite as nice as 1.8. The ruby-core team decided to change ruby so that after parsing the source the AST would be thrown away as part of some kind of optimization. That means things like ParseTree don't work anymore, and you can't inspect the AST of a Proc at runtime.

However, with sourcify you can sort of emulate the 1.8 behaviour, but it's not perfect. It uses ripper to parse the file where the Proc is defined, and then provides the same callbacks that ParseTree does. It mostly seems to work, but it does run into problems if two Procs are defined on the same line, eg: users.select { ... }.select { ... }.

I'm actually a little disappointed that ruby-core decided to go in this direction. I think that decision will end up costing the community because it makes it harder for runtime introspection of code. It means less tools will be written on top of the runtime, and things like Ambition become harder if not impossible to do reliably. I'm glad to see Rubinius is going in the opposite direction though.


Not if i understand your comment completely but ripper in 1.9 seems to what ParseTree was doing. Am i missing something?


Ripper can only parse static source, while ParseTree can give you the AST as it is at runtime. So with Ripper you give it the name of a file/line number and it can parse it, but it can't get any contextual information available at runtime, or anything that might've changed at runtime. With ParseTree I can give it a Proc object and it can parse it in whatever state it is at that point in time.

In a nutshell, how sourcify works is it gets the source_location for the Proc, which gives you the file name and line number. It uses Ripper to parse the file, and the Proc starting at that line number. It provides an object that can be fed into SexpProcessor to extract information about the block of code within the Proc.

Since it's bound to Proc#source_location one of the limitations is it can't distinguish between two or more Proc objects defined on one line (eg users.select { |u| u.id == 1 }.select{ |u| u.name == 'John Doe' }). However, that doesn't seem to be too much of a limitation.


"takes the pain out of working with your favorite RDBMS"

The way that's worded just sounds a little odd.

Aside from that, this looks really cool.


Nice page colors.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: