Postgres indexes under the hood

elchief · on Jan 27, 2018

As an IT educator, I think it's better if people understand how indexes work at a higher level.

Then you have something, even if you forget how an R-tree works

Think of them as being much like a book index. A book index is useful because it's sorted, so you find something fast. It points to the pages the item is on. Same for a db. The table isn't usually sorted, but the index is and points to the required rows

If you're a book editor and the author adds, edits, or removes a section, you need to update the index

Want to lookup a word by it's suffix? You're gonna need to scan the whole book, or have an index of reversed words

Database indexes work the same way

saurik · on Jan 27, 2018

I don't really understand the thought process. Clearly, to know how they work at the low level you must also understand what they are (the superficial understanding you are describing here). I don't see any harm from also knowing how they work, and as someone who has been using PostgreSQL for something like two decades I am extremely glad I do, so saying it is "better" to only know what they are at a superficial level instead of also knowing how they work at a deep level seems really wrong. Meanwhile, as this article notes in its first sentence, most software engineers in this article's target demographic use database indexes every day and we would be shocked if they didn't already have the superficial level of understanding. Now, I haven't verified this is a good article, nor is it clear to me we needed yet another article to explain this, but you seem to be coming from the idea that this knowledge shouldn't even be imparted?... from the perspective of an "IT educator"? :(

elchief · on Jan 27, 2018

I'm not saying it shouldn't be imparted, but the high level knowledge is better first

People seem to get lost in the details. I know devs that understand the various index tree types, but don't understand why something is sargable or why dropping indexes before a bulk copy is faster

Angostura · on Jan 27, 2018

I simply assume that the article is written for people who already know what an index is.

rusbus · on Jan 27, 2018

OP here. That was in fact the case ;-)

kbwt · on Jan 27, 2018

I actually disagree. High-level knowledge without later filling in the gaps leads to cargo-cult software engineering.

If people get lost in the details, they are not prepared to fully understand the big picture.

MaxBarraclough · on Jan 27, 2018

I'm not sure that applies to everything, though. I'm sure we agree there's a limit.

Joel Spolsky is right when he writes about the 'Law of Leaky Abstractions', but I'm not sure the average CRUD developer needs to know the gory nuts and bolts of his DBMS. Of course they need to know enough to reason effectively about good design, performance, security, etc, and more low-level knowledge is always a good thing.

As an extreme example, PHP programmers don't need to understand branch prediction and caches.

tinymollusk · on Jan 27, 2018

You are very correct for anyone who would consider themselves a programmer. I was thinking in a bigger scope; with so many non-programmers working technical roles (and especially with data/databases), I think this is a marvellous way for e.g. project managers to have a brief mental model of what a db index is.

amirouche · on Jan 27, 2018

I think 'Law of Leaky Abstractions' is overrated. Abstractions always leak, the question is whether it is part of the API and explained or not. Another example that is not indices: async/await is a leak of the underlying implementation and inner working.

MaxBarraclough · on Jan 29, 2018

You've mentioned an abstraction but not explained why it's leaky.

I agree that async/await leaks in places, such as in the surprising deadlocks that Stephen Cleary explains. I've met more than one pretty serious C# developer that wasn't aware of this. http://blog.stephencleary.com/2012/07/dont-block-on-async-co...

> the question is whether it is part of the API and explained or not

More broadly it's a question of pretending things can be wrapped up in tidy self-contained entities, and the idea that we can fully design away the messy details of reality. See also the Three Big Lies of C++ https://youtu.be/rX0ItVEVjHc?t=17m15s

amirouche · on Jan 30, 2018

My understanding is that async/await gives the promise to write asynchronous code like synchronous code. Except the use of 'await' it's true.

MaxBarraclough · on Jan 30, 2018

All told I'd say it's an successful but imperfect abstraction. I've been very impressed by it all in all. Concurrent programming isn't easy, and it's not a silver bullet, but it's a great language feature.

In an ideal world you'd just take existing synchronous code and throw async/await keywords at it to make it async. The abstraction isn't quite that successful, of course.

See Stephen Cleary's AsyncEx library, which provides async-specific functionality such as AsyncLock, which doesn't exist in the standard library. https://github.com/StephenCleary/AsyncEx/wiki/AsyncLock

nerdponx · on Jan 27, 2018

Corollary: if the index is on a low-cardinality field, it won't do much good helping you find what you're looking for.

fnl · on Jan 27, 2018

> If you're a book editor and the author adds, edits, or removes a section, you need to update the index

But all the page references would be wrong, too, after the author updates the text. That would not happen with a DB index and it's data pointers.

raarts · on Jan 27, 2018

So, as an IT educator, how much time do you put in explaining this concept?

ris · on Jan 27, 2018

For even more detail, https://github.com/petergeoghegan/pg_hexedit

With further blog posts https://pgeoghegan.blogspot.co.uk/2017/11/pghexedit-rich-hex... and http://pgeoghegan.blogspot.co.uk/2018/01/exploring-sp-gist-a...

chaitanya · on Jan 27, 2018

Any developer working with a database should understand how indexes work. A good book for understanding relational db indexes is http://sql-performance-explained.com. Highly recommended.

Walf · on Jan 27, 2018

How I wish you Americans had just left them as "indices". Almost every time I read it as the verb.

rusbus · on Jan 27, 2018

OP here.

This is funny. On the rest of my blog post I use indices. For this post, I went with indexes because that's what PG calls them in the docs.

I feel your pain.

bunderbunder · on Jan 27, 2018

There may be an odd semantic distinction that doesn't translate.

I (native US English speaker) typically interpret "indexes" as the plural of index as in "structure you use for fast key-value lookup". I interpret "indices" as the plural of index as in "lookup key".

SliderUp · on Jan 28, 2018

Laughed out loud. We used "indexes" and "indices" all over a code base, methods/classes and doc. It was a real mess to true up to one.

ComputerGuru · on Jan 27, 2018

American here, and I agree. But growing up, it was always indices.

Isn’t OED British? They recently took their dictionary behind a paywall (!!), but they recommend “indexes” while it appears accepted that it is the Americanized spelling nonetheless.

Ref https://en.m.wiktionary.org/wiki/indexes

swinghu · on Jan 27, 2018

a great tutorial for distill postgresql'index,thanks