PostgreSQL Streaming Replication (WAL); What It Is and How to Configure One

147 pointsposted 10 hours ago
by thunderbong

29 Comments

pqdbr

9 hours ago

It's a great article, but I've always felt these are missing critical real-world application from the perspective of a full stack dev who also wants to manage their own databases.

- How do I check how many seconds the replica is lagging behind master?

- How would I monitor the replica? A simple cron task that pings a health check if everything is OK (lag is < x) would be a nice start.

And then things get complicated quickly:

- How do I failover to the replica if the primary goes down? There's pgBouncer, repmgr, patroni...

- Should I have it automatically or manually?

- Do I need 2 replicas to avoid the split brain scenario? my brain hurts already.

- After a failover occurs (either automatically or manually), how in the world am I going to configure the primary to be the primary again, and the replica to act as the replica again, going back to the original scenario?

I'd pay to learn this with confidence.

lbriner

2 hours ago

> How do I check replica lagging? I use the prometheus exporter for postgres

> How would I monitor the replica? Same. You can also use something like HA proxy calling a postgres CLI command to connect to the instance

> How do I failover? Mostly, you probably want to do this manually because there can be data loss and you want to make sure the risk is worth it. I simply use repmgr for this.

> Do I need 2 replicas? It's usually good to have at least 3 (1 master and 2 slaves) but mostly so that if one fails, you still have 2 remaining i.e. time to get a 3rd back online

> How do I failback? Again, very easy with repmgr, you just tell the primary to be the primary again. The failed over primary gets stopped, the original primary gets fast-forwarded and promoted to primary and everything else gets told to follow.

I do agree that this space for postgres is very fragmented and some tools appear abandoned but its pretty straight-forward with just postgres + barman + repmgr, I have a series of vides on YouTube if you are interested but I am not a Postgres expert so please no hating :-) https://youtu.be/YM41mLZQxzE

cheald

an hour ago

+1 to all of this. The thing I'd add is that we use barman for our additional replicas; WAL streaming is very easy to do with Barman, and we stream to two backups (one onsite, one offsite). The only real costs are bandwidth and disk space, both of which are cheap. Compared to running a full replica (with its RAM costs), it's a very economical way to have a robust disaster recovery plan.

If you're doing manual failover, you don't need an odd number of nodes in the cluster (since you aren't looking for quorum to automatically resolve split-brain like you would be with tools Elasticsearch or redis-sentinel), so for us it's just a question of "how long does it take to get back online if we lose the primary" (answer: as long as it takes to determine that we need to do a switch and invoke repmgr switchover), and "how robust are we against catastrophic failure" (answer: we can recover our DB from a very-close-to-live barman backup from the same DC, or from an offsite DC if the primary DC got hit by an airplane or something).

napsterbr

8 hours ago

> Do I need 2 replicas to avoid the split brain scenario? my brain hurts already.

It will hurt even more.

The recommended way is to set up a witness server. Yet another thing to manage in a properly designed Postgres cluster. Certainly not an easy/trivial thing to do, ops-wise.

From [0]:

> By creating a witness server in the same location (data centre) as the primary, if the primary becomes unavailable it's possible for the standby to decide whether it can promote itself without risking a "split brain" scenario: if it can't see either the witness or the primary server, it's likely there's a network-level interruption and it should not promote itself. If it can see the witness but not the primary, this proves there is no network interruption and the primary itself is unavailable, and it can therefore promote itself (and ideally take action to fence the former primary).

An interesting acronym you'll hear is STONITH (in order to fence the former primary).

[0] - https://www.repmgr.org/docs/current/repmgrd-witness-server.h...

nolist_policy

8 hours ago

You want STONITH anyways. It's all necessary complexity in a HA cluster.

candiddevmike

8 hours ago

> I'd pay to learn this with confidence.

Great, there's a whole industry of PostgreSQL consultants/platforms ready to take your money. You could put on your tinfoil hat and say say PostgreSQL's lack of out of the box, integrated, easy to use HA is by design to make money.

I think most folks use Patroni with some kind of service discovery solution like Kubernetes or Consul and have it abstract all of this for them.

lbriner

2 hours ago

You could also say that since the maintainers are doing this for free, it is OK that some of them work for businesses that provide (optional) technical support.

However you mileage might vary. We found someone listed as a postgres consultant to help us but aside from a load of (alleged) personal problems that made the job take ages, I realised that the optimisation he sent me was basically a rehashed version of something I found on the internet, which he was obviously misrepresenting as his work. He got really arsey with me when I told him that I was really disappointed with his service!

klaussilveira

4 hours ago

I am managing my own PostgreSQL cluster with Patroni and, so far, the experience has been a breeze. The initial learning curve is difficult, Patroni docs are not the best, and the community support around it is minimal. Naive questions are usually torpedoed by the maintainer or other experienced devs, which does not foster a good environment for people learning on their own.

Luckily, the Percona Distribution for Postgres includes amazing documentation on how to setup and run Patroni, even if you choose not to use their distribution. I would highly recommend following their step by step: https://docs.percona.com/postgresql/17/solutions/ha-setup-ap...

I have OpenTofu scripts for setting this cluster up, although they might not be useful to you since I am using CloudStack instead of other clouds.

andix

5 hours ago

You’re completely right. You need a full management solution around postgres to make it work, and I wouldn’t recommend building it yourself.

One solution is Kubernetes and one of the many postgres operators. Still not easy as pie.

forinti

9 hours ago

Lag is one little detail that I find lacking in Streaming Replication. If there was no transaction in the last n seconds, it's going to tell you that lag is n s.

Oracle's Dataguard will tell there's no lag, because the databases are the same.

In a development database, lag can get quite high, but it makes no sense to set off an alarm if lag gets too high because it might just be that there was nothing going on.

But the simplicity and reliability of Postgresql I much prefer over Oracle, any day of the week.

WJW

9 hours ago

If it's really a problem, you can always use the pt-heartbeat tool from the percona toolkit: https://docs.percona.com/percona-toolkit/pt-heartbeat.html

Conceptually it is very straightforward: It just updates the only row in a special table to the current time every --interval seconds on the main database, then monitors the value in any followers to see which value they have. Subtract the two timestamps to get replication lag.

forinti

9 hours ago

I'll check it out. Thanks.

egnehots

8 hours ago

> How do I check how many seconds the replica is lagging behind the master?

Use PostgreSQL administrative functions, specifically: pg_last_xact_replay_timestamp. (https://www.postgresql.org/docs/current/functions-admin.html...)

> How would I monitor the replica? A simple cron task that pings a health check if everything is OK (lag is < x) would be a good start.

There are many solutions, highly dependent on your context and the scale of your business. Options range from simple cron jobs with email alerts to more sophisticated setups like ELK/EFK, or managed services such as Datadog.

> How do I failover to the replica if the primary goes down?

> Should I handle failover automatically or manually?

> Do I need two replicas to avoid a split-brain scenario? My head hurts already.

While it may be tempting to automate failover with a tool, I strongly recommend manual failover if your business can tolerate some downtime.

This approach allows you to understand why the primary went down, preventing the same issue from affecting the replica. It's often not trivial to restore the primary or convert it to a replica. YOU become the concensus algorithm, the observer, deciding which instance become the primary.

Two scenarios to avoid:

* Falling back to a replica only for it to fail (e.g., due to a full disk).

* Successfully switching over so transparently that you will not notice that you're now running without a replica.

> After a failover (whether automatic or manual), how do I reconfigure the primary to be the primary again, and the replica to be the replica?

It's easier to switch roles and configure the former primary as the new replica. It will then automatically synchronize with the current primary.

You might also want to use the replica for:

* Some read-only queries. However, for long-running queries, you will need to configure the replication delay to avoid timeouts.

* Backups or point-in-time recovery.

If you manage yourself a database, I strongly recommand to gain confidence first in your backups and your ability to restore them quickly. Then you can play with replication, they are tons of little settings to configure (async for perf, large enough wall size to restore quickly, ...).

It's not that hard, but you want to have the confidence and the procedure written down before you have to do it in a production incident.

himinlomax

3 hours ago

> - How do I check how many seconds the replica is lagging behind master?

> - How would I monitor the replica? A simple cron task that pings a health check if everything is OK (lag is < x) would be a nice start.

No, you should use Patroni. It configures and monitors Postgres' native replication.

https://patroni.readthedocs.io/en/latest/

meow_catrix

9 hours ago

The modern way is to sidestep the issue altogether and use Kubernetes with a database designed to run on Kubernetes. You can get sharding, replication and leader election essentially for free - you can concentrate on using the database instead of running the database.

Compute is really cheap compared to engineering man-hours.

cpucycling7

9 hours ago

> The modern way is to sidestep the issue altogether and use Kubernetes

Kubernetes does require quite some time to learn/master. So you could say one replaces one time-consuming issue with another.

remram

9 hours ago

What's "a database designed to run on Kubernetes"? Cassandra?

linuxdude314

4 hours ago

Pretty sure they were referring to the operators you can install that will stand up production ready replicated clusters.

They work quite well.

remram

35 minutes ago

They replied to a comment mentioning "pgBouncer, repmgr, patroni" so supposedly not one of those.

fastest963

8 hours ago

Something like Yugabyte or Cockroach

yrro

7 hours ago

Does the Cloud Native PG operator count?

cheboygan

6 hours ago

+1 on checking out cloud native PostgreSQL operator, or other PG operators like crunchy or zalando or ongres

dionian

8 hours ago

it was a minor pain finding and setting up a postgres operator in k8s, but once i got it going it wasn't too horrible. are these other solutions that are more built for it significantly easier to manage?

slig

7 hours ago

Which one did you end up choosing?

andix

5 hours ago

The only real world easy to use solution for postgres replication I’ve found, are the kubernetes operators. For example CloudnativePG.

It’s not just replication what you need. It’s failover, recovery, monitoring, self-healing, backups, and so on.

Are there any other free/open implementations outside of kubernetes?

xtracto

7 minutes ago

I used the wal feature with a bash script that compressed the wal into .xz files (-9 compression) every 10,000 lines and stored those files in s3fs folder. This as a type of "online" backup solution.

kachapopopow

7 hours ago

I see this as one of the reasons to use kubernetes (& helm).

https://artifacthub.io/packages/helm/bitnami/postgresql

Configures all of this for you with near zero additional configuration required. There's also postgres-ha which handles zero-downtime failover by spawning proxy that handles failures in a specialized way versus just directly forwarding to a psql server.