[pgpool-general: 7860] Re: Rejecting database mutations without a quorum
ishii at sraoss.co.jp
Sun Nov 7 10:52:59 JST 2021
> While running various cluster failover and recovery tests, we came across
> an issue that I would like to discuss here. These tests were performed in a
> setup with 3 nodes (for simplicity called node 1, 2 and 3). Each node runs
> a database, pgpool and an application instance. The application connects to
> the local pgpool, which in turn connects to all 3 databases, sending all
> queries to the one that is currently primary. Suppose node 1 runs the
> primary database. When this node is disconnected from the other 2 nodes via
> a simulated network failure, the other nodes establish consensus to perform
> a failover and either node 2 or 3 is selected for the new primary. However,
> on node 1, the database remains writable and the application and pgpool
> running. If the application on this node is still reachable from the load
> balancer, it will continue to serve requests, resulting in a split brain
> and ultimately database corruption.
> For many of our customers this is unwanted behavior. They would rather see
> the service become unavailable than continue to operate in a split brain. I
> went through the available options on pgpool, but could not find an option
> that would help me here. I'm looking for a way to prevent pgpool from
> accessing its backends with its watchdog is not part of a quorum. Is this
> currently possible in pgpool? If not, is it worth considering adding a
> feature for this?
One of the difficulties with this kind of approach is, how to decide
that the primary should be killed in a reliable way.
Suppose we have:
node 1: pgpool/watchdog 1, primary PostgreSQL
node 2: pgpool/watchdog 2, standby PostgreSQL
node 3: pgpool/watchdog 3, standby PostgreSQL
1) Watchdog 1 lost communication to watchdog 2 and watchdog 3. So
watchdog 1 lost quorum.
2) However communication between pgpool 1/2 and the primary is still healthy.
3) Watchdog 1 decides to kill the primary and the primary goes down.
4) pgpool 1/2 have to promote one of the standbys to create new primary.
SRA OSS, Inc. Japan
More information about the pgpool-general