0000819: pcp_node_info not in sync with pg_stat_replication - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000819	Pgpool-II	Bug	public	2023-11-22 23:56	2023-11-27 09:21

Reporter	martin.gnc	Assigned To	pengbo
Priority	normal	Severity	minor	Reproducibility	always
Status	assigned	Resolution	open
Platform	Linux	OS	RHEL	OS Version	9
Product Version	4.4.4

Summary	0000819: pcp_node_info not in sync with pg_stat_replication
Description	I have 3 servers, each with pgpool 4.4.4 and postgresql 14. Postgres servers are in streaming replication mode(replication slots) with parameter synchronous_standby_names = 'FIRST 1 (node2, node3)' so one replica is in sync, an other is potential. Two servers(node1, node2) are in same location, and third node(node3) is in remote location so connection between dislocated nodes is much slower than nodes in same location. Beacuse of that, pg_rewind takes few minutes. If I failover node1 to node2, node2 is immediately promoted and node1 is new standby in sync with node2. I can see this info in pg_stat_replication table and also in postgres logs. As pg_rewind(ran through follow_primary) takes few minutes, node3 is detached from pgpool. If I ran pcp_node_info during pg_rewind I see a different situation. Node2 is primary, but replication fields, for all nodes, show previous state before failover. Everything works fine, select statements are distributed among up nodes. When pg_rewind is finished, al status are updated and pcp_node_info shows correct information. If I restart pgpool on one node during pg_rewind, status are updated on that pgpool node. This discrepancy between pcp_node_info and pg_stat_replication leds to problem with auto_failback option. If it is on and failover happens(node1->node2) then: 1. pgpool executes failover_command, node1 is shutdowned, node2 is promoted, node3 is still up 2. pgpool executes follow_primary for node1, node3 still up 3. pgpool executes auto_failback because node3 is on, and replication state according to pcp_node_info is streaming, potential 4. follow_primary for node3 is never executed After all, node2 is primary, node1 is standby(streaming, sync), and node3 is standby and not in sync. The problem is when pgpool executes auto_failback before follow_primary on second standby server(node3). It happens usually, but sometimes is follow_primary executed before auto_failback so everything is fine. I put in my failback command select * from pg_stat_replication and when auto_failback is executed, there is no replication for that node so auto_failback shouldnt' happen. I suppose that pgpool watches pcp_node_info and wrongly concludes that it should execute auto_failback. Maybe I am doing something wrong, but everything works fine if auto_failback is off and node is attached to pgpool with pcp_attach_node command through follow_primary.
Tags	No tags attached.

Date Modified	Username	Field	Change
2023-11-22 23:56	martin.gnc	New Issue
2023-11-27 09:21	pengbo	Assigned To	=> pengbo
2023-11-27 09:21	pengbo	Status	new => assigned