Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Postgres indexes under the hood (rcoh.me)
162 points by rusbus on Jan 27, 2018 | hide | past | favorite | 23 comments


As an IT educator, I think it's better if people understand how indexes work at a higher level.

Then you have something, even if you forget how an R-tree works

Think of them as being much like a book index. A book index is useful because it's sorted, so you find something fast. It points to the pages the item is on. Same for a db. The table isn't usually sorted, but the index is and points to the required rows

If you're a book editor and the author adds, edits, or removes a section, you need to update the index

Want to lookup a word by it's suffix? You're gonna need to scan the whole book, or have an index of reversed words

Database indexes work the same way


I don't really understand the thought process. Clearly, to know how they work at the low level you must also understand what they are (the superficial understanding you are describing here). I don't see any harm from also knowing how they work, and as someone who has been using PostgreSQL for something like two decades I am extremely glad I do, so saying it is "better" to only know what they are at a superficial level instead of also knowing how they work at a deep level seems really wrong. Meanwhile, as this article notes in its first sentence, most software engineers in this article's target demographic use database indexes every day and we would be shocked if they didn't already have the superficial level of understanding. Now, I haven't verified this is a good article, nor is it clear to me we needed yet another article to explain this, but you seem to be coming from the idea that this knowledge shouldn't even be imparted?... from the perspective of an "IT educator"? :(


I'm not saying it shouldn't be imparted, but the high level knowledge is better first

People seem to get lost in the details. I know devs that understand the various index tree types, but don't understand why something is sargable or why dropping indexes before a bulk copy is faster


I simply assume that the article is written for people who already know what an index is.


OP here. That was in fact the case ;-)


I actually disagree. High-level knowledge without later filling in the gaps leads to cargo-cult software engineering.

If people get lost in the details, they are not prepared to fully understand the big picture.


I'm not sure that applies to everything, though. I'm sure we agree there's a limit.

Joel Spolsky is right when he writes about the 'Law of Leaky Abstractions', but I'm not sure the average CRUD developer needs to know the gory nuts and bolts of his DBMS. Of course they need to know enough to reason effectively about good design, performance, security, etc, and more low-level knowledge is always a good thing.

As an extreme example, PHP programmers don't need to understand branch prediction and caches.


You are very correct for anyone who would consider themselves a programmer. I was thinking in a bigger scope; with so many non-programmers working technical roles (and especially with data/databases), I think this is a marvellous way for e.g. project managers to have a brief mental model of what a db index is.


I think 'Law of Leaky Abstractions' is overrated. Abstractions always leak, the question is whether it is part of the API and explained or not. Another example that is not indices: async/await is a leak of the underlying implementation and inner working.


You've mentioned an abstraction but not explained why it's leaky.

I agree that async/await leaks in places, such as in the surprising deadlocks that Stephen Cleary explains. I've met more than one pretty serious C# developer that wasn't aware of this. http://blog.stephencleary.com/2012/07/dont-block-on-async-co...

> the question is whether it is part of the API and explained or not

More broadly it's a question of pretending things can be wrapped up in tidy self-contained entities, and the idea that we can fully design away the messy details of reality. See also the Three Big Lies of C++ https://youtu.be/rX0ItVEVjHc?t=17m15s


My understanding is that async/await gives the promise to write asynchronous code like synchronous code. Except the use of 'await' it's true.


All told I'd say it's an successful but imperfect abstraction. I've been very impressed by it all in all. Concurrent programming isn't easy, and it's not a silver bullet, but it's a great language feature.

In an ideal world you'd just take existing synchronous code and throw async/await keywords at it to make it async. The abstraction isn't quite that successful, of course.

See Stephen Cleary's AsyncEx library, which provides async-specific functionality such as AsyncLock, which doesn't exist in the standard library. https://github.com/StephenCleary/AsyncEx/wiki/AsyncLock


Corollary: if the index is on a low-cardinality field, it won't do much good helping you find what you're looking for.


> If you're a book editor and the author adds, edits, or removes a section, you need to update the index

But all the page references would be wrong, too, after the author updates the text. That would not happen with a DB index and it's data pointers.


So, as an IT educator, how much time do you put in explaining this concept?



Any developer working with a database should understand how indexes work. A good book for understanding relational db indexes is http://sql-performance-explained.com. Highly recommended.


How I wish you Americans had just left them as "indices". Almost every time I read it as the verb.


OP here.

This is funny. On the rest of my blog post I use indices. For this post, I went with indexes because that's what PG calls them in the docs.

I feel your pain.


There may be an odd semantic distinction that doesn't translate.

I (native US English speaker) typically interpret "indexes" as the plural of index as in "structure you use for fast key-value lookup". I interpret "indices" as the plural of index as in "lookup key".


Laughed out loud. We used "indexes" and "indices" all over a code base, methods/classes and doc. It was a real mess to true up to one.


American here, and I agree. But growing up, it was always indices.

Isn’t OED British? They recently took their dictionary behind a paywall (!!), but they recommend “indexes” while it appears accepted that it is the Americanized spelling nonetheless.

Ref https://en.m.wiktionary.org/wiki/indexes


a great tutorial for distill postgresql'index,thanks




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: