View Issue Details

IDProjectCategoryView StatusLast Update
0000819Pgpool-IIBugpublic2023-11-27 09:21
Reportermartin.gnc Assigned Topengbo  
PrioritynormalSeverityminorReproducibilityalways
Status assignedResolutionopen 
PlatformLinuxOSRHELOS Version9
Product Version4.4.4 
Summary0000819: pcp_node_info not in sync with pg_stat_replication
DescriptionI have 3 servers, each with pgpool 4.4.4 and postgresql 14.

Postgres servers are in streaming replication mode(replication slots) with parameter synchronous_standby_names = 'FIRST 1 (node2, node3)' so one replica is in sync, an other is potential.

Two servers(node1, node2) are in same location, and third node(node3) is in remote location so connection between dislocated nodes is much slower than nodes in same location.

Beacuse of that, pg_rewind takes few minutes.

If I failover node1 to node2, node2 is immediately promoted and node1 is new standby in sync with node2.

I can see this info in pg_stat_replication table and also in postgres logs.

As pg_rewind(ran through follow_primary) takes few minutes, node3 is detached from pgpool.

If I ran pcp_node_info during pg_rewind I see a different situation.

Node2 is primary, but replication fields, for all nodes, show previous state before failover.

Everything works fine, select statements are distributed among up nodes.

When pg_rewind is finished, al status are updated and pcp_node_info shows correct information.

If I restart pgpool on one node during pg_rewind, status are updated on that pgpool node.

This discrepancy between pcp_node_info and pg_stat_replication leds to problem with auto_failback option.

If it is on and failover happens(node1->node2) then:
1. pgpool executes failover_command, node1 is shutdowned, node2 is promoted, node3 is still up
2. pgpool executes follow_primary for node1, node3 still up
3. pgpool executes auto_failback because node3 is on, and replication state according to pcp_node_info is streaming, potential
4. follow_primary for node3 is never executed

After all, node2 is primary, node1 is standby(streaming, sync), and node3 is standby and not in sync.

The problem is when pgpool executes auto_failback before follow_primary on second standby server(node3). It happens usually, but sometimes is follow_primary executed before auto_failback so everything is fine.

I put in my failback command select * from pg_stat_replication and when auto_failback is executed, there is no replication for that node so auto_failback shouldnt' happen.

I suppose that pgpool watches pcp_node_info and wrongly concludes that it should execute auto_failback.

Maybe I am doing something wrong, but everything works fine if auto_failback is off and node is attached to pgpool with pcp_attach_node command through follow_primary.
TagsNo tags attached.

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2023-11-22 23:56 martin.gnc New Issue
2023-11-27 09:21 pengbo Assigned To => pengbo
2023-11-27 09:21 pengbo Status new => assigned