0000825: Auto Failover and Recovery options - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000825	Pgpool-II	General	public	2024-01-17 15:30	2024-02-06 12:09

Reporter	Srini Balakrishnan	Assigned To	pengbo
Priority	normal	Severity	minor	Reproducibility	have not tried
Status	assigned	Resolution	open
Product Version	4.3.5

Summary	0000825: Auto Failover and Recovery options
Description	Hi, I have enabled pgpool 4.3.5 on a 3 node postgresql (version 14.10) -- both postgresql and pgpool installed on the same node. One of the requirements for our POC to move to production is enable auto failover and recovery methods in case of a failover with one of the node. 1) as of now, during my testing, the pgpool VIP moves to another node when the service is stopped. However there is a lag of 10 secs for this movement and VIP creation in another node. is there any other setting that i can tweak to reduce this time? 2) for the auto failover and recovery of postgresql , i am not able to make it work successfully on pgpool autofail method. the commands are not working as expected. eg., if i shutdown the primary node of postgresql, one of the standby gets promoted as primary however the cluster breaks down. i have to manually delete the data folder on original primary and do sync from new primary to start that as standby. i am not able to join the old primary as a standby node automatically to the cluster. 3) also, if i need to promote one of the standby without stopping the postgresql service and demote the current primary to standby - ie, auto switchover. - is this feature supported in pgpool? i couldnt find any reference(s) on this method in the documentation. if it's not supported by pgpool, do you have any recommendations for any other opensource tools like pg_auto_failover or patroni or repmgr etc. can you please recommend an open-source tool that can help us in setting up the auto failover and switch over methods in addition to pgpool (which is important in our setup to seperate the read/write calls). if you can provide me any steps/documents to achieve, it will be very helpful on this setup.
Tags	No tags attached.

pengbo 2024-01-22 23:49 developer ~0004474	> 1) as of now, during my testing, the pgpool VIP moves to another node when the service is stopped. However there is a lag of 10 secs for this movement and VIP creation in another node. is there any other setting that i can tweak to reduce this time? You can try to decrease the settings of the following parameters: wd_interval wd_heartbeat_deadtime https://www.pgpool.net/docs/43/en/html/runtime-watchdog-config.html#CONFIG-WATCHDOG-LIFECHECK-HEARTBEAT > 2) for the auto failover and recovery of postgresql , i am not able to make it work successfully on pgpool autofail method. the > commands are not working as expected. eg., if i shutdown the primary node of postgresql, one of the standby gets promoted as > primary however the cluster breaks down. i have to manually delete the data folder on original primary and do sync from new > primary to start that as standby. i am not able to join the old primary as a standby node automatically to the cluster. Sorry. Pgpool-II doesn't have the feature to automatically recover the old primary as a standby node. You need to manually restore it as a standby and attach it to cluster. > 3) also, if i need to promote one of the standby without stopping the postgresql service and demote the current primary to standby - ie, auto switchover. > - is this feature supported in pgpool? Yes. You can run "pcp_promote_node --switchover". Please note that to use this feature the settings of "follow_primary_command" and "follow_primary_command" are required. For more information, please see the following documentation: https://www.pgpool.net/docs/43/en/html/pcp-promote-node.html

Srini Balakrishnan 2024-01-30 14:13 reporter ~0004476	Hi, Thanks for the note and details. i tested the following. 1) i reduced the time interval for both the parameters to 1 sec and 2 secs (deadtime) from the detault 10 and 30 secs. I see no significant difference in the VIP movement between nodes. it still takes around 0000003:0000010-15 secs. wd_interval wd_heartbeat_deadtime 2) I have setup Patroni on my POC environment and able to make both patroni managing the postgresql cluster for failover/switchover features and pgpool only for Load balancing and connection pool. the integration works well except i see some glitches when i test the failover/switchover options on postgres via patroni. to avoid the conflict with pgpool handling this, i disabled the failover steps and slightly modified the follow_primary script to recycle the services of pgpool to establish new connections with new primary and standby nodes of postgres. sometimes the pgpool status is always down and i have to run the pcp_attach_node few times to make it work. since i am completely stopping the pgpool2 service and restarting, why is it pgpool2 status not synchronizing automatically? i have also included in the OPTS -D -n to ignore the status on my pgpool2 service restart but it is keeping the old reference or synchronization of status is not seamless - especially whenever there is change in master/standby nodes change in postgres. any recommendations or inputs on handling this scenario? i am using version 4.3.7 and why pgpool when starting the service, is not fully reinitializing the status field as fresh and consider the current stateset of postgres instead of triggering follow_primary again? despite the ignore flag set at the start with -D to ignore the file, it's still thinks there is change in postgres nodes and trigerring follow_primary script. this complicates or disturbs the status flag and i noticed in few scenarios, it goes down from the UP state.

Srini Balakrishnan 2024-01-30 17:44 reporter ~0004477	sometimes when i stop the pgpool2 service it doesnt stop immediately and i could see the process display the below details. any reason why and how it can be prevented? postgres@tst2jdc17:~$ ps -ef \| grep pgpool postgres 1483773 1 0 08:31 ? 00:00:00 /usr/sbin/pgpool -n postgres 1483774 1483773 0 08:31 ? 00:00:00 pgpool: PgpoolLogger postgres 1483778 1483773 0 08:31 ? 00:00:00 [pgpool] <defunct> postgres 1483807 1483773 0 08:31 ? 00:00:00 [pgpool] <defunct> postgres 1483848 1483773 0 08:31 ? 00:00:00 [pgpool] <defunct> postgres 1483849 1483773 0 08:31 ? 00:00:00 [pgpool] <defunct> postgres 1483850 1483773 0 08:31 ? 00:00:00 [pgpool] <defunct> postgres 1485533 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485534 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485535 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485536 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485537 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485538 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485539 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485540 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485541 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485542 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485543 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct> postgres 1485544 1483773 0 08:36 ? 00:00:00 [pgpool] <defunct>

Srini Balakrishnan 2024-02-06 12:09 reporter ~0004481	Hi, is there any alternative for a 3 node pgpool2 cluster to have a common IP without using the delegate IP section? will it work if i use HA Proxy to have a common IP and allow all the 3 pgpool2 nodes to interact via the HA proxy? if i make the above setup, will load balancing of pgpool2 work properly?

Date Modified	Username	Field	Change
2024-01-17 15:30	Srini Balakrishnan	New Issue
2024-01-22 22:33	pengbo	Assigned To	=> pengbo
2024-01-22 22:33	pengbo	Status	new => assigned
2024-01-22 23:49	pengbo	Note Added: 0004474
2024-01-22 23:49	pengbo	Status	assigned => feedback
2024-01-30 14:13	Srini Balakrishnan	Note Added: 0004476
2024-01-30 14:13	Srini Balakrishnan	Status	feedback => assigned
2024-01-30 17:44	Srini Balakrishnan	Note Added: 0004477
2024-02-06 12:09	Srini Balakrishnan	Note Added: 0004481