Hacker Newsnew | past | comments | ask | show | jobs | submit | zenithm's commentslogin

FaunaDB passed. It also has GraphQL.


Shouldn't talk about Spanner and Percolator without discussing Calvin, either. https://www.infoq.com/articles/relational-nosql-fauna


YugaByte DB product manager here. we have compared the Spanner and Calvin architectures in depth previously (https://blog.yugabyte.com/google-spanner-vs-calvin-global-co...). one key difference comes from the fact that Calvin’s deterministic locking protocol requires advance knowledge of all transactions’ read/write sets before transaction execution can begin -- this leads to repeated transaction restarts if the secondary index is on a column whose value is changing frequently as well as no support for client-initiated sessions transactions. in other words, Calvin is better suited for a transactional NoSQL database as opposed to a true distributed SQL database.


This take on Calvin is inaccurate:

Under contention, a Calvin-based system will behave similarly to others which use optimistic locking schemes for Serializable isolation such as Postgres, or YB itself. There are advantages to the Calvin approach as well. For example, under Calvin, the system doesn't have to write out speculative intents to all data replicas in order to detect contention: The only writes to data storage are successful ones. The original paper only describes this briefly, but you can read about how FaunaDB has implemented it in more detail: https://fauna.com/blog/consistency-without-clocks-faunadb-tr...

It's also not a stretch to see how the protocol described in that post can be extended to support session transactions: Rather than executing a transaction per request, the transaction context is maintained for the life of the session and then dispatched to Calvin on commit. (This is in fact how we are implementing it in our SQL API feature.)

I would instead say that one of the more significant differences between Calvin and Spanner is the latter's much stricter requirements it places on its hardware (i.e. clock accuracy) in order to maintain its correctness guarantees; a weakness its variants also share.


Only if things were that simple :) Calvin avoids the need to track clock accuracy by making every transaction go through a single consensus leader which inherently becomes a single point of bottleneck for performance and availability. Spanner and its derivatives including YugaByte DB chose not to introduce such a bottleneck in their architecture with the use of per-shard consensus — this means caring about clock skews for multi-shard transactions and not allowing such transactions to proceed on nodes that exhibit skew more than the max configured. The big question is which is more acceptable: lower performance & availability on the normal path OR handling offending nodes with high clock skew on the exceptions path?


Sorry I can't let this go unchallenged. Again, you are inventing an architectural deficiency where the is none. The log in Calvin is partitioned and does not require all transactions to go through a single physical consensus leader. There is no single node in a Calvin cluster which must handle the entire global stream of transactions. The Calvin paper itself extensively covers how this works in detail: http://cs.yale.edu/homes/thomson/publications/calvin-sigmod1...


Great article!


Netlify has integration with FaunaDB for the database tier.


FaunaDB looks like a great product for a serverless/SPA type app but I don't see how they are integrated with netlify other than they fit together well?


FaunaDB support is built into Netlify Dev and was shown in the Netlify CEO’s keynote at the JAMstack conf. Netlify also has one-click deploy support for including FaunaDB with your applications. For instance this sample app can be deployed almost automatically: https://github.com/sw-yx/netlify-fauna-todo (Disclaimer, I work for Fauna and helped with the one-click deploy integration.)


In the NoSQL space, FaunaDB indexes are very powerful. They are term partitioned and sharded and have compound terms, covered values, transformations, etc.


I'm sure everyone involved made some money but aren't they basically a consultancy?


Acquihire and street cred are valuable at a time when there's a bit of a race in the space. IBM revved up the heat.


These engineers will never otherwise work for VMware. So they spent a shitton of money to hire them. I wouldn't be surprised if it was just for the two engineers.


There are certainly more than just those two engineers that work at Heptio. Surely some of them will also continue the work that Heptio still has in front of it? I don't have any insider knowledge about this besides what was in the article, but it says that "Beda and McLuckie _and their team_" will all be joining VMware.

I have no reason to doubt any of that, although I don't know what it means for Heptio's Amazon partnership, or utilities like Heptio Authenticator. Presumably that work will still continue in some form though, it would be a surprise to hear otherwise since VMware and AWS are already "strategic technology partners" as well.


I knew a guy like that. He was big into open source and he worked for a small start up. His company was swallowed up by a huge tech company. Years later I tried to get him to join the company I worked for, and he said it was impossible, basically his stock options were so insane he couldn't work anywhere else. "Golden handcuffs"


Really? With all the tools they have written that are open source and on GitHub?


Are they selling them for money? Or are they just things useful in the consulting process and a way to create awareness for the company?

The website advertises professional services, training, support subscriptions, and books. Redhat model I guess.


A large incumbent with a service-based revenue model contributing to and offering OSS is a classic move to 'commoditize your complement'[0].

[0] https://www.gwern.net/Complement#2


Also FaunaDB which is the NoSQL version


Good to hear from Fauna again.

I didn't know that the serialization format was JSON is that a long-term decision? Would a binary format be faster?


CosmosDB lets you set region failover priority if something goes wrong, does Fauna have a similar model?

How does an application know which region to talk to? Especially once you can select a subset of regions to be in. Some might not have the data you are looking for.


Since FaunaDB is multi-master (or masterless, if you prefer), there is no failover step per se. Any region in the cluster can receive writes if it is part of the majority partition, and any region can serve consistent reads even if it's temporarily in a minority partition.

You don't have to set any priorities, and partition events don't change commit latency for the cluster majority.

Currently FaunaDB drivers use geo DNS in route53 to automatically find the closest region, although you can pin to specific regions if you know the cname. If that region doesn't own the data for the logical database in question, FaunaDB forwards the request internally.

In the future, drivers will maintain their own ϕ accrual failure detectors and make faster and smarter routing decisions than DNS can provide.


How do you handle write conflicts?


Transactions are strictly serialized; the paper explains how this works the best: https://fauna.com/pdf/FaunaDB-Technical-Whitepaper.pdf

We use a single-phase model inspired by Calvin, rather than Spanner's two-phase model. The tradeoff is that interactive transactions (like in SQL) are not supported, but overall latency and throughput are much better.


Do you use Cosmos? How is it?


I used DocumentDB a while ago but I haven't used the new version. It was a little strange.

You have to reason about every possible consistency configuration including transactions and indexes, but you don't actually get much control over what is indexed. And sort of like DynamoDB it doesn't really support general-purpose transactions, they have to be within a shard.

I think maybe the Mongo adapter is pretty nice though.


Facebook, Github, Pinterest, etc. are listed as Graphcool enterprise customers? Is that true?


No response...I think these are companies that have GraphQL APIs, but no actual relationship with Graphcool. They shouldn't be listed as customers.


Sorry about the delay here, it was a busy night for us.

The enterprise page is targeted decision makers in big companies who might have been sent there by the dev team. As such our main objective is to communicate that GraphQL is a proven technology.

We will make that more explicit on the page. Some of these companies actually do use Graphcool for small projects and prototyping, but that is not the story we want to tell on this page.


As a decision maker, this kind of dishonesty will ensure I can't do business with you. It's a customer list and nowhere does it state that they just use GraphQL and are not your direct customers. I know you're trying to get some traction and all that, but this makes your whole offering look dodgy


Agree, I understand you need to have referenceable customers...but you have to actually...like...have them. I would imagine you are violating restrictions around the use of these companies trademarks.

Similarly confused by the random logos on the homepage with no context. The implication is they somehow rely on Graphcool.


Welcome Dan.

What other production implementations of Calvin are out there?


I said in my post: "influenced the design of several modern “NewSQL” systems" --- I'm not aware of other production implementations of Calvin. But VoltDB's command logging feature came directly from Calvin. So basically I had in mind FaunaDB and VoltDB when I wrote that sentence. Neither is an exact version of Calvin, but FaunaDB is closer to Calvin than VoltDB. Obviously, the Calvin paper has been cited many hundreds of times, so many of its ideas have made it into other systems as well.


> But VoltDB's command logging feature came directly from Calvin.

VoltDev here. Huh? We added this feature in 2011 and read the Calvin paper sometime later IIRC.


"The case for determinism in database systems" paper (which described the technology that became Calvin) was written in VLDB 2010. At least one VoltDB developer told me that command logging came from a talk we gave about this paper to your team.


I think they happened independently, but it's long enough ago that I might not recall if I or Ning had inspiration from somewhere.

If you heard it from Ning or myself then it's probably what happened.


Yes Ariel, it was you who told me this! But I agree with jhugg that it was an obvious choice based on the VoltDB architecture.


lol


I think that logical logging was an obvious choice given VoltDB architecture. It's totally possible there was a talk that was involved for somebody though.

That said, we <3 determinism at VoltDB and rely on it to achieve what we achieve.


How does Volt's transaction resolution mechanism compare? It sounds like that would be a third model yet.


We have a whitepaper here: https://www.voltdb.com/wp-content/uploads/2017/03/lv-technic...

My brief summary comparison. VoltDB is a bit less general in some key ways. It tends to have the same performance no matter how much contention there is, which is rare. It's also getting pretty mature, with lots of integrations and hard corners sanded off.

It also typically has much lower latency than these systems, both theoretically and practically.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: