[pgpool-hackers: 4378] Re: Load balancing after failover not working in 3-node HA PostgreSQL cluster

Sat Aug 19 12:03:25 JST 2023

> Hi Tatsuo
> I've observed some logs following an auto failover that I'd like to discuss
> 
> 2023-08-19 00:44:07.277: main pid 62145: LOG: find_primary_node: primary
> node is 1
> 2023-08-19 00:44:07.277: main pid 62145: LOG: find_primary_node: standby
> node is 2
> 2023-08-19 00:44:07.278: main pid 62145: LOG: starting follow degeneration.
> shutdown host 172.16.14.165(5432)
> 2023-08-19 00:44:07.279: main pid 62145: LOG: starting follow degeneration.
> shutdown host 172.16.14.163(5432)
> 2023-08-19 00:44:07.279: main pid 62145: LOG: failover: 2 follow backends
> have been degenerated
> 2023-08-19 00:44:07.280: main pid 62145: LOG: failover: set new primary
> node: 1
> 
> do_query: extended:0 query:"SELECT pg_is_in_recovery()"
> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG:
> verify_backend_node_status: there's no standby node
> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[0]: 0
> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[1]: 1
> 2023-08-19 00:44:08.305: sr_check_worker pid 62243: DEBUG: node status[2]: 0
> 
> 
> In the above logs, it's evident that the initial primary node at IP address
> 172.16.14.165 (which is currently down) and standby2 node at 172.16.14.163
> were involved in the auto failover process.
> My concern is why did pgpool initiate the shutdown of standby2 node as part
> of the auto failover process?

Because in general standby servers cannot connect to new primary
without otaining copy of the new primary database.

> and How can we prevent such a degeneration
> from happening to standby2 in our case?

You can set '' to follow_primary_command to prevent standbys being
killed by pgpool.

> NOTE, I attempted to resolve the situation by restarting pgpool2 using the
> -d switch. After the restart, everything seemed to work fine, and standby2
> node was correctly marked as standby again. Is restart of pgpool really
> necessary?

No. You just should have waited a little bit longer so that the follow
primary command completed the job: recovering node 2. Please remember
that the follow primary command keeps on running after the failover
completed.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp