[pgpool-general: 8539] Issues taking a node out of a cluster

Emond Papegaaij emond.papegaaij at gmail.com
Tue Jan 10 23:56:20 JST 2023


Hi all,

We are seeing failures in our test suite on a specific set of tests related
to taking a node out of a cluster. In short, it seems to following sequence
of events occurs:
* We start with a health cluster with 3 nodes (0, 1 and 2), each node
running pgpool and postgresql. Node 0 runs the primary database.
* node 1 is shutdown
* pgpool on node 0 and 2 correctly mark backend 1 down
* pgpool on node 0 is reconfigured, removing node 1 from the configuration,
backend 0 remains backend 0, backend 2 is now known as backend 1
* pgpool on node 0 starts up again, and receives the cluster status from
node 2, which includes backend 1 being down.
* pgpool on node 0 now also marks backend 1 as being down, but because of
the renumbering, it actually marks the backend on node 2 as down
* pgpool on node 2 gets its new configuration, same as on node 0
* pgpool on node 2 (which is now runs backend 1) gets the cluster status
from node 0, and marks backend 1 down
* the cluster ends up with pgpool and postgresql running on both remaining
nodes, but backend 1 is down. It never recovers from this state
automatically, even though auto_failback is enabled and postgresql is up
and streaming.

For node 2 (with backend 1), pcp_node_info returns the following
information for backend 1:
Hostname               : 172.29.30.3
Port                   : 5432
Status                 : 3
Weight                 : 0.500000
Status Name            : down
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2023-01-09 22:28:41

My first question is: Can we somehow prevent the state of backend 1 being
assigned to the wrong node during the configuration update?

My second question: Why does the auto_failback not reattach backend 1 when
it detects the database is up and streaming?

Best regards,
Emond Papegaaij
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20230110/575fa2a0/attachment.htm>


More information about the pgpool-general mailing list