hackernews client

pqdbr

9 months ago

It's a great article, but I've always felt these are missing critical real-world application from the perspective of a full stack dev who also wants to manage their own databases.

- How do I check how many seconds the replica is lagging behind master?

- How would I monitor the replica? A simple cron task that pings a health check if everything is OK (lag is < x) would be a nice start.

And then things get complicated quickly:

- How do I failover to the replica if the primary goes down? There's pgBouncer, repmgr, patroni...

- Should I have it automatically or manually?

- Do I need 2 replicas to avoid the split brain scenario? my brain hurts already.

- After a failover occurs (either automatically or manually), how in the world am I going to configure the primary to be the primary again, and the replica to act as the replica again, going back to the original scenario?

I'd pay to learn this with confidence.

lbriner

9 months ago

> How do I check replica lagging? I use the prometheus exporter for postgres

> How would I monitor the replica? Same. You can also use something like HA proxy calling a postgres CLI command to connect to the instance

> How do I failover? Mostly, you probably want to do this manually because there can be data loss and you want to make sure the risk is worth it. I simply use repmgr for this.

> Do I need 2 replicas? It's usually good to have at least 3 (1 master and 2 slaves) but mostly so that if one fails, you still have 2 remaining i.e. time to get a 3rd back online

> How do I failback? Again, very easy with repmgr, you just tell the primary to be the primary again. The failed over primary gets stopped, the original primary gets fast-forwarded and promoted to primary and everything else gets told to follow.

I do agree that this space for postgres is very fragmented and some tools appear abandoned but its pretty straight-forward with just postgres + barman + repmgr, I have a series of vides on YouTube if you are interested but I am not a Postgres expert so please no hating :-) https://youtu.be/YM41mLZQxzE

cheald

9 months ago

+1 to all of this. The thing I'd add is that we use barman for our additional replicas; WAL streaming is very easy to do with Barman, and we stream to two backups (one onsite, one offsite). The only real costs are bandwidth and disk space, both of which are cheap. Compared to running a full replica (with its RAM costs), it's a very economical way to have a robust disaster recovery plan.

If you're doing manual failover, you don't need an odd number of nodes in the cluster (since you aren't looking for quorum to automatically resolve split-brain like you would be with tools Elasticsearch or redis-sentinel), so for us it's just a question of "how long does it take to get back online if we lose the primary" (answer: as long as it takes to determine that we need to do a switch and invoke repmgr switchover), and "how robust are we against catastrophic failure" (answer: we can recover our DB from a very-close-to-live barman backup from the same DC, or from an offsite DC if the primary DC got hit by an airplane or something).

napsterbr

9 months ago

> Do I need 2 replicas to avoid the split brain scenario? my brain hurts already.

It will hurt even more.

The recommended way is to set up a witness server. Yet another thing to manage in a properly designed Postgres cluster. Certainly not an easy/trivial thing to do, ops-wise.

From [0]:

> By creating a witness server in the same location (data centre) as the primary, if the primary becomes unavailable it's possible for the standby to decide whether it can promote itself without risking a "split brain" scenario: if it can't see either the witness or the primary server, it's likely there's a network-level interruption and it should not promote itself. If it can see the witness but not the primary, this proves there is no network interruption and the primary itself is unavailable, and it can therefore promote itself (and ideally take action to fence the former primary).

An interesting acronym you'll hear is STONITH (in order to fence the former primary).

[0] - https://www.repmgr.org/docs/current/repmgrd-witness-server.h...

nolist_policy

9 months ago

You want STONITH anyways. It's all necessary complexity in a HA cluster.

candiddevmike

9 months ago

> I'd pay to learn this with confidence.

Great, there's a whole industry of PostgreSQL consultants/platforms ready to take your money. You could put on your tinfoil hat and say say PostgreSQL's lack of out of the box, integrated, easy to use HA is by design to make money.

I think most folks use Patroni with some kind of service discovery solution like Kubernetes or Consul and have it abstract all of this for them.

lbriner

9 months ago

You could also say that since the maintainers are doing this for free, it is OK that some of them work for businesses that provide (optional) technical support.

However you mileage might vary. We found someone listed as a postgres consultant to help us but aside from a load of (alleged) personal problems that made the job take ages, I realised that the optimisation he sent me was basically a rehashed version of something I found on the internet, which he was obviously misrepresenting as his work. He got really arsey with me when I told him that I was really disappointed with his service!

klaussilveira

9 months ago

I am managing my own PostgreSQL cluster with Patroni and, so far, the experience has been a breeze. The initial learning curve is difficult, Patroni docs are not the best, and the community support around it is minimal. Naive questions are usually torpedoed by the maintainer or other experienced devs, which does not foster a good environment for people learning on their own.

Luckily, the Percona Distribution for Postgres includes amazing documentation on how to setup and run Patroni, even if you choose not to use their distribution. I would highly recommend following their step by step: https://docs.percona.com/postgresql/17/solutions/ha-setup-ap...

I have OpenTofu scripts for setting this cluster up, although they might not be useful to you since I am using CloudStack instead of other clouds.

Shadowmist

9 months ago

lol the quotes in step two

anonair

9 months ago

Yep, very strange, considering overall quality of the guide

andix

9 months ago

You’re completely right. You need a full management solution around postgres to make it work, and I wouldn’t recommend building it yourself.

One solution is Kubernetes and one of the many postgres operators. Still not easy as pie.

forinti

9 months ago

Lag is one little detail that I find lacking in Streaming Replication. If there was no transaction in the last n seconds, it's going to tell you that lag is n s.

Oracle's Dataguard will tell there's no lag, because the databases are the same.

In a development database, lag can get quite high, but it makes no sense to set off an alarm if lag gets too high because it might just be that there was nothing going on.

But the simplicity and reliability of Postgresql I much prefer over Oracle, any day of the week.

WJW

9 months ago

If it's really a problem, you can always use the pt-heartbeat tool from the percona toolkit: https://docs.percona.com/percona-toolkit/pt-heartbeat.html

Conceptually it is very straightforward: It just updates the only row in a special table to the current time every --interval seconds on the main database, then monitors the value in any followers to see which value they have. Subtract the two timestamps to get replication lag.

forinti

9 months ago

I'll check it out. Thanks.

ttfkam

9 months ago

Learning how to do all of this is laudable for the same reason that learning LinuxFromScratch is a stupendous way to learn how Linux works and to get a job doing lower level maintenance work.

That said, everything on your list is provided out of the box with managed cloud vendors like AWS's Aurora DB clusters.

Replica lag: check.

Primary and replica status checks: check.

Auto-failover with strategies: check.

Split brain: I'll be honest, this is a deceptively hard problem that is strongly dependent on use cases and setting expectations. CAP theorem exists for a reason. Do NOT trust anyone who tells you they have a one and done solution to this. They're selling snake oil.

After failover recovery: check.

----

Please note that I recognize you want to learn this for yourself, and I wholeheartedly encourage you to. I think it's important to know that there are already mature solutions out there that trade money in exchange for your time to be spent in other areas.

egnehots

9 months ago

> How do I check how many seconds the replica is lagging behind the master?

Use PostgreSQL administrative functions, specifically: pg_last_xact_replay_timestamp. (https://www.postgresql.org/docs/current/functions-admin.html...)

> How would I monitor the replica? A simple cron task that pings a health check if everything is OK (lag is < x) would be a good start.

There are many solutions, highly dependent on your context and the scale of your business. Options range from simple cron jobs with email alerts to more sophisticated setups like ELK/EFK, or managed services such as Datadog.

> How do I failover to the replica if the primary goes down?

> Should I handle failover automatically or manually?

> Do I need two replicas to avoid a split-brain scenario? My head hurts already.

While it may be tempting to automate failover with a tool, I strongly recommend manual failover if your business can tolerate some downtime.

This approach allows you to understand why the primary went down, preventing the same issue from affecting the replica. It's often not trivial to restore the primary or convert it to a replica. YOU become the concensus algorithm, the observer, deciding which instance become the primary.

Two scenarios to avoid:

* Falling back to a replica only for it to fail (e.g., due to a full disk).

* Successfully switching over so transparently that you will not notice that you're now running without a replica.

> After a failover (whether automatic or manual), how do I reconfigure the primary to be the primary again, and the replica to be the replica?

It's easier to switch roles and configure the former primary as the new replica. It will then automatically synchronize with the current primary.

You might also want to use the replica for:

* Some read-only queries. However, for long-running queries, you will need to configure the replication delay to avoid timeouts.

* Backups or point-in-time recovery.

If you manage yourself a database, I strongly recommand to gain confidence first in your backups and your ability to restore them quickly. Then you can play with replication, they are tons of little settings to configure (async for perf, large enough wall size to restore quickly, ...).

It's not that hard, but you want to have the confidence and the procedure written down before you have to do it in a production incident.

himinlomax

9 months ago

> - How do I check how many seconds the replica is lagging behind master?

> - How would I monitor the replica? A simple cron task that pings a health check if everything is OK (lag is < x) would be a nice start.

No, you should use Patroni. It configures and monitors Postgres' native replication.

https://patroni.readthedocs.io/en/latest/

meow_catrix

9 months ago

The modern way is to sidestep the issue altogether and use Kubernetes with a database designed to run on Kubernetes. You can get sharding, replication and leader election essentially for free - you can concentrate on using the database instead of running the database.

Compute is really cheap compared to engineering man-hours.

cpucycling7

9 months ago

> The modern way is to sidestep the issue altogether and use Kubernetes

Kubernetes does require quite some time to learn/master. So you could say one replaces one time-consuming issue with another.

kachapopopow

9 months ago

https://artifacthub.io/packages/helm/bitnami/postgresql, postgres is fortunately one of them. Downside is that you cannot scale writes.

remram

9 months ago

What's "a database designed to run on Kubernetes"? Cassandra?

linuxdude314

9 months ago

Pretty sure they were referring to the operators you can install that will stand up production ready replicated clusters.

They work quite well.

remram

9 months ago

They replied to a comment mentioning "pgBouncer, repmgr, patroni" so supposedly not one of those.

dgraph_advocate

9 months ago

Dgraph is designed to run on Kubernetes.

remram

9 months ago

Not relational, not a replacement for Postgres.

Also doesn't look open-source at first glance.

dgraph_advocate

9 months ago

You should try it before claiming ”not relational”, as it can totally store and use relational data, and also be very useful.

Even though it says ”graph” you don’t need to write graph traversal queries - just describe in GraphQL the nested data model you want to get out of the database and that’s what you will get.

Community edition is open source.

remram

9 months ago

> Community edition is open source.

If that is true, you could definitely not make it less obvious. This is not the impression I got from your repo ("variously licensed under the Apache Public License 2.0 (APL) and the Dgraph Community License"). Neither your docs nor your website mensions a "community edition" either.

dgraph_advocate

9 months ago

It is not my project and I have nothing to do with Dgraph. After 15+ years of RDBMS, I just prefer it.

Also, first Google hit is dgraph.io:

> The only open source, AI-ready graph database that gives developers the tools to quickly build distributed applications at scale.

fastest963

9 months ago

Something like Yugabyte or Cockroach

meow_catrix

9 months ago

[dead]

yrro

9 months ago

Does the Cloud Native PG operator count?

cheboygan

9 months ago

+1 on checking out cloud native PostgreSQL operator, or other PG operators like crunchy or zalando or ongres

dionian

9 months ago

it was a minor pain finding and setting up a postgres operator in k8s, but once i got it going it wasn't too horrible. are these other solutions that are more built for it significantly easier to manage?

slig

9 months ago

Which one did you end up choosing?

andix

9 months ago

The only real world easy to use solution for postgres replication I’ve found, are the kubernetes operators. For example CloudnativePG.

It’s not just replication what you need. It’s failover, recovery, monitoring, self-healing, backups, and so on.

Are there any other free/open implementations outside of kubernetes?

xtracto

9 months ago

I used the wal feature with a bash script that compressed the wal into .xz files (-9 compression) every 10,000 lines and stored those files in s3fs folder. This as a type of "online" backup solution.

andix

9 months ago

Sure, but you probably want some automated snapshot recovery on a node that has a quite recent replica. Restoring a complete backup might take hours or days, and in a desaster scenario you want to be back online within a few minutes. Probably even without manual intervention.

And at the same time you want so spin up a new replica to replace the lost one.

kachapopopow

9 months ago

I see this as one of the reasons to use kubernetes (& helm).

https://artifacthub.io/packages/helm/bitnami/postgresql

Configures all of this for you with near zero additional configuration required. There's also postgres-ha which handles zero-downtime failover by spawning proxy that handles failures in a specialized way versus just directly forwarding to a psql server.

PostgreSQL Streaming Replication (WAL); What It Is and How to Configure One

40 Comments