Replicated PostgreSQL with pgpool2

ruckusing · on Sept 1, 2014

Another solution is wal-e[1] which handles continuous archiving. It was built by the Heroku guys and as such is battle-tested.

I use wal-e myself and its indispensable and easy to use.

https://github.com/wal-e/wal-e

secure · on Sept 1, 2014

After reading its README.rst, it seems to me like wal-e only cares about backing up (and restoring) WAL files, not about synchronuous replication like the original article.

tacticus · on Sept 2, 2014

But it is very easy to combine wal-e and the built in streaming replication.

rpedela · on Aug 31, 2014

Why isn't the built-in master/slave replication plus WAL archiving to something like S3 sufficient?

secure · on Aug 31, 2014

Excellent question. The way I understand it is that master/slave replication provides eventual consistency by shipping WAL logs to the other node once a WAL log is finalized.

pgpool2 on the other hand provides statement-level consistency, i.e. a statement is only successful when it is committed to all healthy nodes.

I agree that for many use-cases, the built-in master/slave replication is probably good enough, but I wanted to try going all the way… :)

bremac · on Aug 31, 2014

Master / slave replication also has a synchronous mode, so lack of consistency isn't an issue. Either way, you still need to do some work on top of replication in order to get automatic failover.

secure · on Aug 31, 2014

Thanks for pointing this out. Seems like in the described setup (which does not use automatic failover) there really is no advantage then.

I’ll make a note to look into replacing the pgpool setup with a standard master/standby solution and see if I run into any problems that prevent me from doing that :).

secure · on Sept 1, 2014

After reading the manual again, I recall what made me not pursue the PostgreSQL built-in master/slave replication:

""" Commits made when synchronous_commit is set to on or remote_write will wait until the synchronous standby responds. The response may never occur if the last, or only, standby should crash.

[…]

If you really do lose your last standby server then you should disable synchronous_standby_names and reload the configuration file on the primary server. """

The way I interpret this is that in case my one and only standby server crashes, my database will not allow any modifications until I intervene. With pgpool2, writes will continue to work on the master, and it’s my responsibility to eventually bring back the standby server.

rpedela · on Sept 1, 2014

I agree that is a little weird. It should at least be configurable if you want that behavior or not. 9.4 introduces the ALTER SYSTEM command which may make it easier to automate the process of disabling synchronous_standby_names when a failover happens. There are some other improvements to replication in 9.4 as well.

https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_...

jimktrains2 · on Sept 1, 2014

> The way I interpret this is that in case my one and only standby server crashes, my database will not allow any modifications until I intervene.

In synchronous mode...I mean, what else do you expect? How else would you expect it to handle a net split (I'm only guessing a split slave won't perform meads?).

I mean, if you're _setting_ this mode (which it doesn't seem you would), I'm assuming you desire this behavior, though, so, I'm not sure why you're so up-in-arms about it.

> With pgpool2, writes will continue to work on the master, and it’s my responsibility to eventually bring back the standby server.

Isn't that how it works with not in synchronous mode?

> After reading the manual again, I recall what made me not pursue the PostgreSQL built-in master/slave replication:

Also, what did you chose if not postgres? MySQL or did you pay for a license for MSSQL or Oracle. Is there another F/OSS database worth considering?

joevandyk · on Sept 1, 2014

If you want to use sync commits, you need a minimum of three servers.

Think about what happens if you only have two machines. If the standby goes down, then it's impossible for data to be protected if the master also goes down.

secure · on Sept 1, 2014

With pgpool2 and two servers, I can have synchronous commits _as long as_ both servers are healthy, and still continue operating (with less durability) when only one server is healthy.

Of course, it’s impossible to protect against data loss when the remaining server also goes down, but you always have that risk :). As I said, I realize that a setup with only two servers cannot be perfect, but it’s all I’m willing to afford for a spare-time hobby.

So, in comparison, pgpool2 provides me with a more convenient mode of operation for my use-case.

joevandyk · on Sept 1, 2014

What's the point of using sync commits if you don't actually ensure that the data is on two machines before the transaction is committed?

rpedela · on Sept 1, 2014

I think OP wants at least one machine that is healthy and fully operational at any point in time. But the cost of a three machine setup is too great. There are three main failures scenarios.

1. Primary fails and standby takes over. Sync mode helps here because there should be no data loss for completed transactions. In async mode, there could be some data loss for completed transactions.

2. Standby fails and primary continues to operate normally. When the standby is back online, it catches up. Currently the primary would not be able to continue to operate normally because of those config settings.

3. Both primary and standby fail simultaneously. A very unlikely scenario but can be solved with WAL archiving which does have the risk of potential data loss.

thu · on Aug 31, 2014

PostgreSQL provides different ways to replicate a database state. There are multiple ways to create full backups (file-sytsem level backups, so-called basebackups, and dumps) and accompanying ways to restore them. There are multiple continuous synchronisation schemes too: shipping WAL segments, and streaming replication. Streaming replication can be synchronous or asynchronous.

When I started Reesd, the main reason was to be able to use (a single invokation of) scp to ship WAL segments to three different machines in different datacenters.

Actually, now that Reesd is deployed, it itself uses multiple of the above mechanisms. It ships WAL segments (exactly as a user of Reesd could do it, by uploading to a Reesd bucket), uses synchronous replication between the primary and a first standby, and asynchronous replication with a second standby. Both standby's are also configured to use WAL segments if available. Indeed, starting a standby will first use the WAL segments before connecting to the primary to begin streamming replication.

imglorp · on Sept 1, 2014

See also bucardo.

http://bucardo.org/wiki/Bucardo

vinodc · on Aug 31, 2014

We run a similar setup at Kloudless [1]. We use PgBouncer [2] for connection pooling, which connects to pgpool2 to load balance between our Postgres servers. We've noticed PgBouncer is more performant at handling thousands of simultaneous connections.

[1] https://kloudless.com [2] http://wiki.postgresql.org/wiki/PgBouncer

techdragon · on Sept 1, 2014

pgbouncer is like having an nginx reverse proxy in front of your postgres connections.

adamnemecek · on Aug 31, 2014

Seems down Google cache link

http://webcache.googleusercontent.com/search?q=cache:MPIiThx...

secure · on Aug 31, 2014

You sure? I am monitoring the page and didn’t notice anything. The site is IPv6-enabled, though, so perhaps your IPv6 connectivity is broken?

Traceroutes welcome in case it is reproducibly down for you :).

adamnemecek · on Sept 1, 2014

Yeah it's reproducible. It works if I go through a proxy. If you give me your email I'll send you a traceroute.

secure · on Sept 1, 2014

The email address is the same as the domain, just with the first dot replaced by an @ sign :).

smcleod · on Aug 31, 2014

I've done a lot of work with pgpool over the past year, be aware there are lots of situations where it won't work for you. For example if your devs don't write their own SQL and instead use a framework with limited control you're going to have a bad time.

jbardnz · on Aug 31, 2014

I use Postgres with Laravel so i'm not writing any SQL queries directly, can you go into more detail about why this won't work for me?

smcleod · on Aug 31, 2014

I was speaking to the author (Tatsuo Ishii) around the reliability of PGPool-II when you did not have complete control over your SQL, he made it clear that if you cannot control what kinds of SQL is and isn't run from the application you and I quote "I will suggest you avoid pgpool-II.", He also noted that "Pgpool-II does not handle multi statements query very well" so pretty much count yourself out of using PGPool if you're using any multi-statement queries which is at least in our case a great deal of our SQL.

RA_Fisher · on Aug 31, 2014

Fantastic write up, really. Surely, there has to be a simpler way, no?

secure · on Aug 31, 2014

Thank you.

As for a simpler way, I think suitable scripts for recovery should be shipped with pgpool2 as a first step.

A second step would be integrating pgpool2 functionality into PostgreSQL itself. That way, the whole authentication problem would go away, and you would not need to run a separate program. Also, the WAL shipping could be replaced by just letting the non-primary nodes use a replication connection to the primary node directly.

That would not get rid of all of the complexity, but it’d hide a lot more of it from the user :).

tacticus · on Sept 2, 2014

Yep use the built in postgres replication and something like wal-e (from heroku origionally) and then manage the failover with a trigger file.

raarts · on Aug 31, 2014

The problem with these setups is that: 'There Are Many Ways To Do It'(tm), and: 'You Really Need To Test For Your Use-Case'(tm). I need to read and understand everything to decide what's best in my case. And then you need to write a lot of scripts, do a lot of time-consuming testing, and document everything.

Scalable, Reliable PostgreSQL is not really there yet.

There are many ways to do it, yes, but most people just want one thing: the db failing over in case it goes down.

nfm · on Sept 1, 2014

Scalable, reliable Postgres is absolutely possible. This comment is unfounded.

If you truly need reliability, of course you'll have to invest some time into setting it up. If this isn't something you're interested in, just go with a hosted postgres instance like Heroku.

jeffdavis · on Sept 1, 2014

As the author points out below[1], this use case does seem to be solved by the built-in postgres replication.

So, there is a single solution. If you want greater flexibility, pgpool2 may be an option; otherwise, just use what's built-in.

[1] https://news.ycombinator.com/item?id=8250497

raarts · on Sept 1, 2014

Try reading the PostgreSQL documentation on this, then read my comment again.

jeffdavis · on Sept 1, 2014

Can you please make a more specific criticism? What should be simplified?

techdragon · on Sept 1, 2014

As someone that's deployed a streaming replication master slave pair with a backup regime designed to achieve a Point In Time Recovery Target of no more than 1 minutes data loss in the event of complete failure requiring restoration from backups....

Everything, it all needs to be simplified. Yes it's the "difficult" problem for a database, but it's also the one thing where I feel Postgres is lacking. What's needed is the Postgres community to quit messing around and accept that clustering is no longer something they can fob off to other projects. They need to adopt either Postgres-XC or preferably Postgres-XL and push it forward with full force.

The current state of affairs is "pgpool, slony, repmgr, buccardo, and the rest, pick one, spend hours testing how well they fit your use case. Then use that." Now this is ok when a project is new. But Postgres is about 20 years old. Clustering is not new, many of these solutions are old, we finally have a very good clustering option but it's not even mentioned on half the official wiki pages about clustering last time I looked. Postgres is a professional grade tool, so it's ok to not endorse things not considered ready yet... But Postgres-XL is in use in production deployments, it's built out of Postgres itself not via proxies or other layering over or inside vanilla Postgres. It's not a hack, and endorsing it as the definitive future of clustered Postgres but not necessarily endorsing it as "as good as regular Postgres" is long overdue. At least that's my opinion as a serious production Postgres user who makes the decision to use other databases more often than not, due to lack of good clustering.

rpedela · on Sept 1, 2014

The last time I looked at Postgres-XL, it required handling standby data nodes yourself. Has that changed?

techdragon · on Sept 1, 2014

When you bring them up in matched pairs, ensuring that the data nodes are using streaming replication is a lot easier ... AND worth the trouble for the ability to do whole cluster "upgrades" without taking the cluster itself offline.

joevandyk · on Sept 1, 2014

pg_basebackup and pg_receivexlog are built in to postgresql and are all you need to do PITR.

techdragon · on Sept 2, 2014

Yes. And so good that the tools I use to deal with PITR are actually just wrappers over those 2 commands.

But how you orchestrate the recovery process, test the recovery process, centralise your backup files for faster tar/upload to offsite storage. All these things are "not simple" when all you have is the Postgres docs.

Postgres is more than just a database/database engine. Its a database ecosystem.

xordon · on Aug 31, 2014

Surely there must be an easier way?

therealmarv · on Aug 31, 2014

I'm sure this will be a built in feature in PostgreSQL 9.4 and it is easier than using an external tool like pgpool II

akbar501 · on Aug 31, 2014

PostgresXL (http://www.postgres-xl.org) is one alternative solution.

majorsc2noob · on Aug 31, 2014

That does not have automatic failover. I really want to use Postgres at work but every time I start reading about failover, load balancing, sharding and so on it seems like such a mess with PG.

I just want sharding with automatic failover to separate datacenter in a simple package...

dc2447 · on Aug 31, 2014

I'm not sure there is actually something in the open source space that is ACID compliant and offers all those features.

Sharding is a problem in Postgres and I am sure they will fix the disconnect between how the data is inserted and how it's read pretty soon.

Automatic failover and datacentre awareness are not typical features in Open Source RDBMS, although you some NOSQL solutions may do this for you.

I'm willing to overlook many weaknesses in PG as we get blistering performance and amazing stability combined with features like hot schema upgrades.

devonnulled · on Aug 31, 2014

Good luck finding a simple package. We met failover and cross-site replication requirements by using Pacemaker controlling pgsql.

mlakewood · on Sept 1, 2014

AFAIK the only solution to this in the MySQL world is MHA. And that seems not that much better than the way Postgres does it. Which Relational databases are you referring to that do have this?