After reading its README.rst, it seems to me like wal-e only cares about backing up (and restoring) WAL files, not about synchronuous replication like the original article.
Excellent question. The way I understand it is that master/slave replication provides eventual consistency by shipping WAL logs to the other node once a WAL log is finalized.
pgpool2 on the other hand provides statement-level consistency, i.e. a statement is only successful when it is committed to all healthy nodes.
I agree that for many use-cases, the built-in master/slave replication is probably good enough, but I wanted to try going all the way… :)
Master / slave replication also has a synchronous mode, so lack of consistency isn't an issue. Either way, you still need to do some work on top of replication in order to get automatic failover.
Thanks for pointing this out. Seems like in the described setup (which does not use automatic failover) there really is no advantage then.
I’ll make a note to look into replacing the pgpool setup with a standard master/standby solution and see if I run into any problems that prevent me from doing that :).
After reading the manual again, I recall what made me not pursue the PostgreSQL built-in master/slave replication:
"""
Commits made when synchronous_commit is set to on or remote_write will wait until the synchronous standby responds. The response may never occur if the last, or only, standby should crash.
[…]
If you really do lose your last standby server then you should disable synchronous_standby_names and reload the configuration file on the primary server.
"""
The way I interpret this is that in case my one and only standby server crashes, my database will not allow any modifications until I intervene. With pgpool2, writes will continue to work on the master, and it’s my responsibility to eventually bring back the standby server.
I agree that is a little weird. It should at least be configurable if you want that behavior or not. 9.4 introduces the ALTER SYSTEM command which may make it easier to automate the process of disabling synchronous_standby_names when a failover happens. There are some other improvements to replication in 9.4 as well.
> The way I interpret this is that in case my one and only standby server crashes, my database will not allow any modifications until I intervene.
In synchronous mode...I mean, what else do you expect? How else would you expect it to handle a net split (I'm only guessing a split slave won't perform meads?).
I mean, if you're _setting_ this mode (which it doesn't seem you would), I'm assuming you desire this behavior, though, so, I'm not sure why you're so up-in-arms about it.
> With pgpool2, writes will continue to work on the master, and it’s my responsibility to eventually bring back the standby server.
Isn't that how it works with not in synchronous mode?
> After reading the manual again, I recall what made me not pursue the PostgreSQL built-in master/slave replication:
Also, what did you chose if not postgres? MySQL or did you pay for a license for MSSQL or Oracle. Is there another F/OSS database worth considering?
If you want to use sync commits, you need a minimum of three servers.
Think about what happens if you only have two machines. If the standby goes down, then it's impossible for data to be protected if the master also goes down.
With pgpool2 and two servers, I can have synchronous commits _as long as_ both servers are healthy, and still continue operating (with less durability) when only one server is healthy.
Of course, it’s impossible to protect against data loss when the remaining server also goes down, but you always have that risk :). As I said, I realize that a setup with only two servers cannot be perfect, but it’s all I’m willing to afford for a spare-time hobby.
So, in comparison, pgpool2 provides me with a more convenient mode of operation for my use-case.
I think OP wants at least one machine that is healthy and fully operational at any point in time. But the cost of a three machine setup is too great. There are three main failures scenarios.
1. Primary fails and standby takes over. Sync mode helps here because there should be no data loss for completed transactions. In async mode, there could be some data loss for completed transactions.
2. Standby fails and primary continues to operate normally. When the standby is back online, it catches up. Currently the primary would not be able to continue to operate normally because of those config settings.
3. Both primary and standby fail simultaneously. A very unlikely scenario but can be solved with WAL archiving which does have the risk of potential data loss.
PostgreSQL provides different ways to replicate a database state. There are multiple ways to create full backups (file-sytsem level backups, so-called basebackups, and dumps) and accompanying ways to restore them. There are multiple continuous synchronisation schemes too: shipping WAL segments, and streaming replication. Streaming replication can be synchronous or asynchronous.
When I started Reesd, the main reason was to be able to use (a single invokation of) scp to ship WAL segments to three different machines in different datacenters.
Actually, now that Reesd is deployed, it itself uses multiple of the above mechanisms. It ships WAL segments (exactly as a user of Reesd could do it, by uploading to a Reesd bucket), uses synchronous replication between the primary and a first standby, and asynchronous replication with a second standby. Both standby's are also configured to use WAL segments if available. Indeed, starting a standby will first use the WAL segments before connecting to the primary to begin streamming replication.
We run a similar setup at Kloudless [1]. We use PgBouncer [2] for connection pooling, which connects to pgpool2 to load balance between our Postgres servers. We've noticed PgBouncer is more performant at handling thousands of simultaneous connections.
I've done a lot of work with pgpool over the past year, be aware there are lots of situations where it won't work for you. For example if your devs don't write their own SQL and instead use a framework with limited control you're going to have a bad time.
I was speaking to the author (Tatsuo Ishii) around the reliability of PGPool-II when you did not have complete control over your SQL, he made it clear that if you cannot control what kinds of SQL is and isn't run from the application you and I quote "I will suggest you avoid pgpool-II.", He also noted that "Pgpool-II does not handle multi statements query very well" so pretty much count yourself out of using PGPool if you're using any multi-statement queries which is at least in our case a great deal of our SQL.
As for a simpler way, I think suitable scripts for recovery should be shipped with pgpool2 as a first step.
A second step would be integrating pgpool2 functionality into PostgreSQL itself. That way, the whole authentication problem would go away, and you would not need to run a separate program. Also, the WAL shipping could be replaced by just letting the non-primary nodes use a replication connection to the primary node directly.
That would not get rid of all of the complexity, but it’d hide a lot more of it from the user :).
The problem with these setups is that: 'There Are Many Ways To Do It'(tm), and: 'You Really Need To Test For Your Use-Case'(tm). I need to read and understand everything to decide what's best in my case. And then you need to write a lot of scripts, do a lot of time-consuming testing, and document everything.
Scalable, Reliable PostgreSQL is not really there yet.
There are many ways to do it, yes, but most people just want one thing: the db failing over in case it goes down.
Scalable, reliable Postgres is absolutely possible. This comment is unfounded.
If you truly need reliability, of course you'll have to invest some time into setting it up. If this isn't something you're interested in, just go with a hosted postgres instance like Heroku.
As someone that's deployed a streaming replication master slave pair with a backup regime designed to achieve a Point In Time Recovery Target of no more than 1 minutes data loss in the event of complete failure requiring restoration from backups....
Everything, it all needs to be simplified. Yes it's the "difficult" problem for a database, but it's also the one thing where I feel Postgres is lacking. What's needed is the Postgres community to quit messing around and accept that clustering is no longer something they can fob off to other projects. They need to adopt either Postgres-XC or preferably Postgres-XL and push it forward with full force.
The current state of affairs is "pgpool, slony, repmgr, buccardo, and the rest, pick one, spend hours testing how well they fit your use case. Then use that." Now this is ok when a project is new. But Postgres is about 20 years old. Clustering is not new, many of these solutions are old, we finally have a very good clustering option but it's not even mentioned on half the official wiki pages about clustering last time I looked. Postgres is a professional grade tool, so it's ok to not endorse things not considered ready yet... But Postgres-XL is in use in production deployments, it's built out of Postgres itself not via proxies or other layering over or inside vanilla Postgres. It's not a hack, and endorsing it as the definitive future of clustered Postgres but not necessarily endorsing it as "as good as regular Postgres" is long overdue. At least that's my opinion as a serious production Postgres user who makes the decision to use other databases more often than not, due to lack of good clustering.
When you bring them up in matched pairs, ensuring that the data nodes are using streaming replication is a lot easier ... AND worth the trouble for the ability to do whole cluster "upgrades" without taking the cluster itself offline.
Yes. And so good that the tools I use to deal with PITR are actually just wrappers over those 2 commands.
But how you orchestrate the recovery process, test the recovery process, centralise your backup files for faster tar/upload to offsite storage. All these things are "not simple" when all you have is the Postgres docs.
Postgres is more than just a database/database engine. Its a database ecosystem.
That does not have automatic failover. I really want to use Postgres at work but every time I start reading about failover, load balancing, sharding and so on it seems like such a mess with PG.
I just want sharding with automatic failover to separate datacenter in a simple package...
AFAIK the only solution to this in the MySQL world is MHA. And that seems not that much better than the way Postgres does it. Which Relational databases are you referring to that do have this?
I use wal-e myself and its indispensable and easy to use.
https://github.com/wal-e/wal-e