Global Multi-Cloud Replication in FaunaDB Serverless Cloud

redwood · on June 8, 2017

How does the commercial viability of this company look? It's one thing to bet on an open source project, and another thing to bet on a cloud database for a standard database where in a pinch you can always run that same cloud database somewhere else or even on your own servers. But for a fully managed proprietary cloud offering that is not a standard database software technology that you can't run anywhere and not backed by one of the big clouds... it seems like a huge risk to bet on this offering.

evanweaver · on June 8, 2017

You can run the on-premises edition yourself. That's what our large enterprise customers do, for this reason and others.

jchanimal · on June 8, 2017

My favorite part is that expanding the cluster does "not affect the latency profile of existing applications: writes to FaunaDB only need to commit to the closest majority of datacenters to maintain consistency. Currently all data in FaunaDB Serverless Cloud is replicated to every region, guaranteeing low latency reads."

This means that as we grow, your apps will get faster and be able to run closer to your users.

zenithm · on June 8, 2017

CosmosDB lets you set region failover priority if something goes wrong, does Fauna have a similar model?

How does an application know which region to talk to? Especially once you can select a subset of regions to be in. Some might not have the data you are looking for.

evanweaver · on June 8, 2017

Since FaunaDB is multi-master (or masterless, if you prefer), there is no failover step per se. Any region in the cluster can receive writes if it is part of the majority partition, and any region can serve consistent reads even if it's temporarily in a minority partition.

You don't have to set any priorities, and partition events don't change commit latency for the cluster majority.

Currently FaunaDB drivers use geo DNS in route53 to automatically find the closest region, although you can pin to specific regions if you know the cname. If that region doesn't own the data for the logical database in question, FaunaDB forwards the request internally.

In the future, drivers will maintain their own ϕ accrual failure detectors and make faster and smarter routing decisions than DNS can provide.

redwood · on June 8, 2017

How do you handle write conflicts?

evanweaver · on June 8, 2017

Transactions are strictly serialized; the paper explains how this works the best: https://fauna.com/pdf/FaunaDB-Technical-Whitepaper.pdf

We use a single-phase model inspired by Calvin, rather than Spanner's two-phase model. The tradeoff is that interactive transactions (like in SQL) are not supported, but overall latency and throughput are much better.

redwood · on June 8, 2017

Do you use Cosmos? How is it?

zenithm · on June 8, 2017

I used DocumentDB a while ago but I haven't used the new version. It was a little strange.

You have to reason about every possible consistency configuration including transactions and indexes, but you don't actually get much control over what is indexed. And sort of like DynamoDB it doesn't really support general-purpose transactions, they have to be within a shard.

I think maybe the Mongo adapter is pretty nice though.

evanweaver · on June 8, 2017

Let us know what other regions and cloud providers you would like to see, like maybe Digital Ocean, etc. We're exciting to keep rolling these out.

lux · on June 8, 2017

+1 for DO support!

jedberg · on June 8, 2017

> FaunaDB Serverless Cloud remains the only multi-master, globally-distributed cloud database.

Cassandra or Datastax? Cassandra has been doing this for years.

Or did they mean the only hosted option?

Edit: Made me sound less rude.

nemothekid · on June 9, 2017

Isn't Google Cloud's Cloud Spanner hosted, multi-master, and globally-distributed?

evanweaver · on June 9, 2017

"Cloud Spanner currently offers only regional instance configurations: replication within one region of the United States, Europe, or Asia. Regional instance configurations in additional Google Cloud Platform regions will be added throughout 2017. Multi-region replication (i.e., replication across multiple geographies) is planned for future release."

evanweaver · on June 8, 2017

Yes, database as a service. As early contributors to Cassandra before Datastax/Riptano was around...we are aware of it. :-)

jedberg · on June 8, 2017

Fair enough. :) But just FYI, reading that set off my BS alarm, and I was highly skeptical of the rest of your claims. You may want to clarify that so as not to turn off folks who are deeply familiar with the space.

evanweaver · on June 8, 2017

Will keep this in mind. How would you prefer it to be described?

jedberg · on June 8, 2017

FaunaDB Serverless Cloud remains the only {hosted|SaaS} multi-master, globally-distributed cloud database?

I don't really like those either.

Just something that says "you don't have to set this up for yourself like Cassandra or Riak."

Although that reminds me that you can get a turnkey Riak setup, so your claim is still a bit dubious.

I now get what you're trying to say though. That you guys host a multi-master database, and with all your competitors, you have to run infrastructure. I'm not quite sure how to express that succinctly.

evanweaver · on June 9, 2017

An attempt to clarify has been made. Thanks.

lux · on June 8, 2017

Sounds very cool. One thing I'd love to see on the pricing page is an estimation tool where you can enter different values to see a monthly cost estimate.

evanweaver · on June 8, 2017

That makes sense. In particular, the minimum price is always wildly better because of the serverless model (metered, like S3), but you still want to see where you will be at with bursts and such, or compare to a static Postgres or DynamoDB cluster at expected load.

A lot of the benefit comes from not having to manage capacity up and down in the first place, though. Even if other systems let you do it quickly you still have to either predict or react to your load "by hand".

marknadal · on June 8, 2017

Sigh, the contrast between both the current CockroachDB submission and this FaunaDB one is a perfect comparison.

- CockroachDB starts with an image explaining how their query engine handles requests.

- FaunaDB starts with saying they are the only multi-master cloud database.

- CockroachDB then spends the next 3K words to explain how and why.

- FaunaDB claims that others are just cross "continental" systems, and that they are the only "global" ones, with no reasoning to justify the claims.

Yes, FaunaDB being a proprietary hosted service is certainly targeting a different audience than CockroachDB which is Open Source facing. But it damages your brand to make untrue claims:

- Cassandra is a multi-master database you can run across globally distributed clouds.

- Heck, even my own system, https://github.com/amark/gun , is a multi-master database that you can (and I have) run across globally distributed clouds.

- CosmosDB has a tunable option for this now, I believe.

I also don't recall their vocabulary being "multi-master" before either, because that doesn't match with the claims of being "Globally Consistent" in the CAP Theorem sense. Unless they just mean it is sharded? But that is different.

I'm sure my comment will just be ignored, but I ask you for your own sake (and database vendors in general) to not make marketing claims like this. Database vendors are notorious for doing this, and it caused a big falling out with developers because of it. Finally, I felt like between RethinkDB, me, and others, that we were starting to make amends again, being open with the industry/community. I'm not trying to be harsh just to be harsh, I genuinely mean this: If you make a claim, please back it up - you guys are smart and hard working, so please just go the extra step to provide the evidence.

eldenbishop · on June 9, 2017

Good comments and I felt the same way. This stinks of CAP violations and bullshit. It may have a lot of value but the bullet points are eye rolling.