SQLite – The “server-process-edition” branch

quinthar · on Aug 15, 2018

It's scales so perfectly we are deploying it across 3 datacenters, each with four 384 core machines with 3 terabytes of RAM. I can't speak highly enough about sqlite and the team behind it.

da_chicken · on Aug 15, 2018

I don't understand why you'd use SQLite for a deployment this heavy over PostgreSQL or MySQL. Yes, SQLite is fantastic, but why would you choose to do it this way?

andrewguenther · on Aug 15, 2018

Because SQLite is often the right tool for the job, more so than Postgres and MySQL. It scales insanely well and requires minimal configuration and near zero administration. I try to use SQLite for as long as is feasible in my projects purely because it "just works".

Don't let the "Lite" fool you. Depending on your needs, you can scale to 10s of thousand of users using just SQLite. Or not! It is all about knowing your system and properly evaluating your options. I choose to use SQLite because it often fits the use case of small to medium projects the best.

chubot · on Aug 15, 2018

Honest question: where do Postgres and MySQL fall down for such deployments? What is the administration required? Backups? Users and permissions? Indices?

FWIW I use sqlite for a personal wiki, i.e. a website with exactly one user :)

codefined · on Aug 16, 2018

I think the idea is that if your service does more, it necessarily needs more work to maintain. From a simple point of view, there are just more moving parts to go wrong.

eismcc · on Aug 16, 2018

The lite doesn’t refer to light, it refers to a “mineral”.

https://changelog.com/podcast/201

oblio · on Aug 16, 2018

I was confused by this, so I read the transcript. In case anyone else reads this, the name is supposed to be read as:

SQL (S Q L or sequel, your call) - ite

So it does sound like the name of a mineral: bauxite, boehmite, hematite, etc. Heck, kryptonite :D

TeMPOraL · on Aug 16, 2018

> Jerod Santo:

> Hm, like a mineral. Were you playing on the word "light", or were you just playing on mineral...?

> Richard Hipp:

> I was, I was.

It's pronounced as a mineral. It's still a pun on "light". And an obvious one, given what SQLite is compared to other RDBMSes.

api · on Aug 16, 2018

If someone put a good fast raft replication engine in process in front of sqlite you'd have a pretty killer db for a lot of things.

captn3m0 · on Aug 16, 2018

It exists: https://github.com/CanonicalLtd/dqlite

oconnor663 · on Aug 16, 2018

Today is your lucky day: https://github.com/rqlite/rqlite

gcbw2 · on Aug 15, 2018

I think it is a tradeoff. If you want more data resilience on disk, SQLite will require all sorts of hacks to (and more difficult, when) to dump to disk.

On the other hand, if you need every ounce of performance and do not care so much about data resiliency on disk, then the other sql applications will be too much hassle to configure in a way you do not miss any of the many safeguards to save data they ship with by default, which will kill your performance when you least expect.

emn13 · on Aug 15, 2018

It's faster? Simpler?

Conversely, why would you choose one of those alternatives?

(I don't actually think there aren't any reasons to do so, mind you, but I can well imagine that if you really KISS, well - even at impressive sizes sqlite makes sense).

vesak · on Aug 16, 2018

SQLite's typing is rather weak, is it not? That might be a tad scary for someone used who relies on that part of PostgreSQL.

kjax · on Aug 16, 2018

It sure is. After recently porting a large application from SQLite to Postgres, much of my time was spent casting types and ensuring code was updated to input the correct types into Postgres. All in all, the changes were for the better, as many bugs were discovered and resolved in the process.

da_chicken · on Aug 16, 2018

Well, the historic one is, "because you need multiple concurrent data writers."

But, that's what this branch is supposed to be for. So now the reason is: "because you need multiple concurrent data writers from multiple processes." And it might also be "you need multiple concurrent data writers and care about your data enough to not risk loss in the event of a system crash or power failure" since it uses PRAGMA synchronous=OFF.

le-mark · on Aug 15, 2018

You are using this particular branch in production, deployed as you mention, or sqlite in general? Some more info about this use case would be very interesting if you can share!

gregmac · on Aug 15, 2018

What sort of volume is this handling? Is the dataset in memory, or if not, what size and type of I/O is backing this? It seems like a ton of CPU, whereas in my experience typically I/O is the primary bottleneck for database loads.

basilgohar · on Aug 15, 2018

Note the second comment in the following thread:

https://news.ycombinator.com/item?id=12739771

quinthar posted in quite a lot of detail there as well, so it may provide some context.

charleslmunger · on Aug 15, 2018

The link says that it requires turning synchroous off, which means that you won't be waiting for real I/O on transactions, since no fsync calls are emitted. Add that to a huge amount of RAM for cache, and it's very reasonable to be CPU bound... So long as you don't mind data loss or corruption on power loss or kernel panic...

latch · on Aug 16, 2018

Doesn't a BBU Raid controllers largely address the power loss concern? And, I always thought BBU Raid controllers were just absolute common sense for any serious database (until everyone went cloud and suddenly basics like dual network card, dual PSU and raid controllers didn't fit Amazon's desire to sell complexity).

nh2 · on Aug 16, 2018

Well, data will be lost if anything below the OS buffer cache crashes (source http://sqlite.1065341.n5.nabble.com/How-dangerous-is-PRAGMA-...).

The BBU protects from power loss of the HDs, but not power loss or general failure of the mainboard or any other important component.

PAlso BBUs can run out of battery so a flash backed BBU is generally recommended.

singron · on Aug 16, 2018

Without fsync, the dirty page can sit in RAM and not even be sent to the disk, so a battery backup for the disk wouldn't solve data loss.

latch · on Aug 16, 2018

Right..but you leave fsync on in this case, no? This might be incompatible with the locking required by this server-process edition though (I guess, no clue). But more generally speaking, fsync=on with a proper raid controller gives massive performance boost (like orders of magnitude) while being relatively resilient to power loss.

nutjob2 · on Aug 15, 2018

"in my experience typically I/O is the primary bottleneck for database loads"

Not when you have potentially neatly 3TB of memory cache.

Of course it all depends on the dataset.

viburnum · on Aug 16, 2018

So is there one SQLite process running on each of the 384 cores?

est · on Aug 16, 2018

omg is this Expensify?

https://news.ycombinator.com/item?id=16118776

_pgmf · on Aug 15, 2018

It may be a bit out-of-date, but BerkeleyDB provides a SQLite front-end that supports page-level locking -- and thus has better TPS when multiple threads are writing. It's basically SQLite with the btree ripped out and replaced with BerkeleyDB. More information here [0].

[0] http://www.oracle.com/technetwork/database/database-technolo...

xenadu02 · on Aug 15, 2018

Mods: there is a much more up-to-date version of the readme at this link: https://sqlite.org/src/artifact/b98409c486d6f028

charleslmunger · on Aug 15, 2018

If this requires turning off pragma synchronous, doesn't that disqualify it for anything but read only tasks? Power loss etc would corrupt the database...

bl4ckm0r3 · on Aug 15, 2018

You could have a master slave architecture where you disable it for the slaves only. It really depends on the use cases.

est · on Aug 16, 2018

wait, sqlite can do master-slave in some way?

siscia · on Aug 16, 2018

No, vanilla SQLite cannot.

But think about it as a library, you can build master-slave on top of it.

Which is basically what I did with https://redisql.com/ in the pro version exploiting Redis AOF

amiga-workbench · on Aug 15, 2018

Its not a requirement, but it will run quicker with it off.

Hello71 · on Aug 15, 2018

Install a UPS.

rollcat · on Aug 15, 2018

Doesn't protect against a kernel panic, or other kind of crash.

quinthar · on Aug 15, 2018

Expensify is powered entirely using sqlite (inside bedrockdb.com) and it is freaking amazing.

sciurus · on Aug 15, 2018

This was discussed a couple years ago at https://news.ycombinator.com/item?id=12739771

ris · on Aug 15, 2018

Good thing you don't need 64bit integers.

Seriously, as impressive as SQLite is for what it is, people shouldn't be given the impression that it doesn't come with some massive and often surprising caveats when viewed as a general purpose database.

ris · on Aug 16, 2018

Looks like I misremembered this (re 64bit integers) and it was one of the other slightly odd limitations SQLite has around rowids and foreign keys that caused me to abandon the last system I was designing with it...

As for what most concerned me most when using SQLite was the ease of which it would happily allow me to make broken queries and not raise a fuss, e.g. referencing non-GROUP BYed terms in an aggregation.

ak39 · on Aug 16, 2018

Hi ris,

Can you provide example(s) of the broken query with non-GROUP BY columns in an aggregation? (Or a reference link if this is well known - I've searched briefly but can't find anything.)

lmz · on Aug 16, 2018

Not the parent poster but it's queries like these:

Select user.id, user.name, sum(tx.amount) from user inner join txn on (user.id = txn.uid) group by user.id

User.name is not in the group by list so it is picked arbitrarily from the rows in the group. It's harmless here but not in all cases.

Groxx · on Aug 19, 2018

TBH I like the ability to do this. Sometimes you truly don't care which value you get, you just want one. Sometimes you have application-level constraints which mean the one it picks is guaranteed to be correct, but you can avoid the cost of enforcing that (at insertion time and at lookup time). It's a handy tool when it's appropriate.

But hell yes, tools should warn you about this loudly, as it's not generally correct.

yawaramin · on Aug 16, 2018

Did you file a bug report?

HumanDrivenDev · on Aug 15, 2018

SQLite has 64 bit integers. In fact row ids are 64 bit by default since version 3.

dagenix · on Aug 15, 2018

It has some quirks, but, I'd be interested to know what you thing are its "massive and surprising" issues.

Also, it has 64-bit integers - https://www.sqlite.org/datatype3.html

piyh · on Aug 15, 2018

Think you have a few misplaced double negatives.

siscia · on Aug 15, 2018

What I wasn't able to understand is if this follow the main SQLite develop efforts or of it is just a frozen branch.

Is there any reason why it is not incorporated into the main branch and activated by compile time flags? Just complexity?

std_throwaway · on Aug 15, 2018

The branch

https://sqlite.org/src/timeline?r=server-process-edition

looks active.

I would hesitate to integrate such a feature into mainline prematurely. If you really need it you can use the branch.

quinthar · on Aug 15, 2018

we have been using this page locking technology for quite a while now and it works incredibly well. Having both read and write concurrency is super powerful and scales fantastically.

mbrumlow · on Aug 15, 2018

This is good news! Already storing over 7PB of data in sqlite, and have already needed to make many work arounds with readers/writers.

rbranson · on Aug 16, 2018

This sounds really interesting. Can you expand on this installation?

mbrumlow · on Aug 16, 2018

Thanks, and Sure!

First off, it is not a single instance, but many sqlite databases. Many wich has 20+ TiBs of data.

For the last 10+ years I have worked at a backup company and we had gone threw some iterations of storage backends, including a inhoues system that fell on its face. So far the only thing that has been able to keep up with our demands has been Sqlite, although there have been some hurdles.

Because of how we want isolation between users having any sort of shared database system really is not a option. And in the early days our users liked the idea of simply copying a users dataset and sending it some place else.

In any case we tested, I should say a guy named Brain on our team tested many different systems and ended up picking Sqlite many years ago. It has stuck and served us very very well. It was not until we wanted to access the data in different ways that we ran into concurrency issues and made a few weird decisions on how to access the data to avoid some of these problems.

These databases store the volume data for each volume we back up. And because it is a database we can track the blocks in such a way that allows us to present a consistent view of each backup in time. And it has been good enough to be able to export volumes as iscsi luns (custom iscsi server I wrote to interface when the backend), and in turn allow us to virtualize peoples systems we backed up on the fly without moving data from one format to another.

Now that I think about it ~7PB is only half of it, that is just the data we have managed by our hardware. We have many other people storing just as much if not more on their systems.

Looking on my original comment it may seem like I implied that we have a single 7PB database, but no, it is 1000s of smaller ones between 8 and 30TiB in most cases.

Have we had data loss? yes, but it was mostly to improperly using sqlite. Except one occasion which we were sure it was sqlite itself, which I am sure Mr. Hipp will refute :p, but I forget the details and a fix was applied quickly.

Sqlite is nice, and I would recommend it to anybody for just about any use, even uses you might have not considered for a embedded database.

philips · on Aug 16, 2018

Are you putting binary blobs in the sqlite databases directly?

mbrumlow · on Aug 16, 2018

Yes, the block data is directly inserted into the database.

This has worked fine, but I don't think it is optimal. I think we might get some performance increases if we just used seqlite for keeping block offsets into a flat file. But then there are a bunch of other problems to solve.

Data is compressed, do we would have to handle free hole allocations for variable sized data. Or simply waste free space. Not that sqlite does does any better with those things, just things we would have to solve in general.

rb808 · on Aug 16, 2018

I don't understand why SQLite would be used instead of writing a data structure. To me going through SQL (& ORMs) is a hindrance.

Why use a database if you only have one process?

oconnor663 · on Aug 16, 2018

Transactions. It's relatively easy to atomically edit random data in a file on disk, if you're willing to overwrite the entire file with your new version. But as soon as you need to edit files that are too big for that approach, the problem becomes exceptionally complicated.

Indexing any columns you want in one line of code is also very nice.

grzm · on Aug 16, 2018

Comparable questions are why use a library when you can just write the code yourself, or why use a high level language when you can write it in assembly. SQLite gives you a lot of power, having already solved a lot of the problems and with a lot more testing, that you then don't have to. The price you pay for that is using their API, which, in this case, is SQL. Using an ORM is a separate choice. You may choose differently than others on these decisions, but using SQLite is certainly a reasonable decision that many people find compelling.

It's also useful to point out that SQLite can have many reader processes. The single-process limit is only on writes.

pjc50 · on Aug 16, 2018

SQLite is a data structure. The process of finding things on disk and atomically updating them is not as simple as it sounds and is about 90% of the work of a database.

beefhash · on Aug 16, 2018

To add to the arguments already made: You don't have to figure out serialization. In particular, writing out structs in C and C++ can break even between compiler versions if padding changes.

api · on Aug 16, 2018

That's fine when you don't need complex queries or your data is small. As complexity increases you will eventually end up with a buggy slow relational database.

Drdrdrq · on Aug 16, 2018

And even then it's better to use a standard solution that other people will know too, if nothing else, to make maintenance easier.

le-mark · on Aug 15, 2018

The collision rate (percentage of attempted transactions that failed due to a page-level locking conflict) in all tests was between 1 and 2%. Failed transactions are not included in the TPS counts below.

I read this as: retry logic required. That's a bit onerous IMO. Wrapping a single writer in a mutex gives near rw=1 performance in the cases I tested.

deepakprakash · on Aug 15, 2018

Can someone explain a bit more?

(a) Does this mean that there will be an sqlite version that can be deployed as a process accessible over network in servers?

(b) Does this mean that sqlite will now have the capability to do concurrent reads/writes, but that it is upto users to implement a process that can take advantage of this and create something like (a)?

X-Istence · on Aug 15, 2018

> (a) Does this mean that there will be an sqlite version that can be deployed as a process accessible over network in servers?

No.

> (b) Does this mean that sqlite will now have the capability to do concurrent reads/writes, but that it is upto users to implement a process that can take advantage of this and create something like (a)?

Yes... this allows a multi-threaded application for example to have multiple readers/writers without blocking each other.

nbevans · on Aug 15, 2018

... as long as those threads don't touch the same page as each other. Page locks <> row locks.

blattimwind · on Aug 15, 2018

Depends on the width of your rows and the relation of access patterns to physical row allocation.

hd4 · on Aug 15, 2018

For some reason, I had the idea that this was always the case with SQLite, at least for the last 10 years or so.

blattimwind · on Aug 15, 2018

In WAL mode, readers don't block writers and vice versa, but there can still only be one writer active at any given time.

dragonwriter · on Aug 15, 2018

This provides (b), not (a).

boksiora · on Aug 15, 2018

amazing!! cant wait for this to be in the main branch

mingodad · on Aug 15, 2018

How this compare with https://github.com/bloomberg/comdb2 that adds stored procedures using LUA scripting language and decimal types ?

tenken · on Aug 15, 2018

In what ways should they be compared?

This appears to be a means to run Sqlite in a server environment with multiple processes hitting it.

What you linked is just some Lua based extension that has little todo with Sqlite performance on certain types of infrastructure.

toosleepy · on Aug 15, 2018

Comdb2 is a database system that uses SQLite as a query planner. It also allows Lua stored procedures. It has a different backend from SQLite with different capabilities.

bmm6o · on Aug 15, 2018

"All clients are in the same process", so you only get multiple processes if you roll your own IPC (as I understand it).