This isn't innovation though. You literally just write your server like you would for a single machine, then wrap it any of the available Raft libraries.
AWS and other cloud providers are money printers because a lot of engineers are insanely tied into established patterns of doing things and can't think through things at a fundamental level. Ive seen company backends where their entire AWS stacks could be replaced by a 2 EC2 instances behind a load balancer with a domain name, without affecting business flow.
We did something similar to the work in the OP post at my work, we had a bunch of ECS tasks for a service, where the service did another call to an upstream service to fetch some intermediate results. We wanted to cache results for lower response latency. People were working to set up a Redis cluster. Except the TPS of the service was like 0.1.
Took me one day to code a /sync api endpoint, which was just a replica of the main endpoint. The only difference is that the main endpoint would spin of a thread to call the /sync endpoint, whereas the /sync endpoint didn't. Both endpoints ended with caching the results in memory before returning. Easy as day, no additional infra costs necessary.
But overall, personally, I don't hate the "spending innovation tokens to build a database is nuts" sentiment too much, because it keeps me employed at high salary while doing minimal work, where things that really should be basic CS are considered innovation.
> then wrap it any of the available Raft libraries.
Raft does consensus. Raft does not do persistence to disk, WAL, crash recovery, indexing, vacuuming (you're using tombstones for your deletes, right?), or any of the other necessary pieces of a database. That's not mentioning how such a system has no query engine, so every piece of data you're looking up in every place you need data is traversing your bespoke data structures.
What you described isn't a database. Keeping some disposable values cached isn't a database.
Raft does do persistence and crash recovery, at least of the transaction logs.
What you need from your side (and there are libraries that already do this):
a) A mechanism to snapshot all the data
b) An easy in-memory mechanism to create indexes on fields--not strictly needed, but definitely makes things a lot more easier to work with.
Bespoke data structures are just simple classes, so if you're familiar with traversing simple objects in the language of your choice, you're all set. You might be over-estimating the benefits of a query engine (and I have worked at multiple places that used MySQL extensively, and used MySQL to build heavily scaled software in the past).
> Raft does do persistence and crash recovery, at least of the transaction logs.
It simply does not. The paper that definitionally is Raft doesn't tell you how to interact with durable storage. The raft protocol handles crash recovery in so far as it allows one or more nodes to rebuild state after a crash, but Raft doesn't talk about serialization or WAL or any of the other things you inevitably have to do for reliability and performance. It gives you a way to go from some existing state to the state of the leader (even if that means downloading the full state from scratch), but it doesn't give you a way to go from a pile of bits on a disk to that existing state.
If you have a library that implements Raft and gives you those things, that's not Raft giving you things. And that library could just be SQLite.
> You might be over-estimating the benefits of a query engine
No, I'm not. It's great to describe the data I want and get, say, an array of strings back without having to crawl some Btrees by hand.
> The paper that definitionally is Raft doesn't tell you how to interact with durable storage.
That's being a bit pedantic. Yeah, I did mean that any respectable library implementing Raft would handle all of this correctly.
> without having to crawl some Btrees by hand.
This is not how I query an index. First, we don't even use Btrees, most of the times it's just hash-tables, and otherwise it's a simpler form of binary search trees. But in both cases, it's completely abstracted away in library I'm using. So if I'm trying to search for companies with a given name, in my code it looks like '(company-with-name "foobar")'. If I'm looking for users that belong to a specific company, it'll look like '(users-for-company company)'.
So I still think you're overestimating the benefits of a query engine.
>persistence to disk, WAL, crash recovery, indexing, vacuuming (you're using tombstones for your deletes, right?),
The point of Raft is that you write your service like it was a single instance, using SQLLite or non relational equivalent, and then use Raft to run a distributed system that can have redundancy, all without additional infra involved, and for the vast majority of the use cases (i.e some backend or some web app service at a startup), this is more than enough, considering there is enough low level stuff in drivers and kernels to make data reliability pretty high already.
AWS and other cloud providers are money printers because a lot of engineers are insanely tied into established patterns of doing things and can't think through things at a fundamental level. Ive seen company backends where their entire AWS stacks could be replaced by a 2 EC2 instances behind a load balancer with a domain name, without affecting business flow.
We did something similar to the work in the OP post at my work, we had a bunch of ECS tasks for a service, where the service did another call to an upstream service to fetch some intermediate results. We wanted to cache results for lower response latency. People were working to set up a Redis cluster. Except the TPS of the service was like 0.1.
Took me one day to code a /sync api endpoint, which was just a replica of the main endpoint. The only difference is that the main endpoint would spin of a thread to call the /sync endpoint, whereas the /sync endpoint didn't. Both endpoints ended with caching the results in memory before returning. Easy as day, no additional infra costs necessary.
But overall, personally, I don't hate the "spending innovation tokens to build a database is nuts" sentiment too much, because it keeps me employed at high salary while doing minimal work, where things that really should be basic CS are considered innovation.