[pgpool-general: 7528] Re: Strange behavior on switchover with detach_false_primary enabled

Wed Apr 28 03:13:39 JST 2021

On Tue, Apr 27, 2021 at 9:30 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >> the leader. Next, we detach the primary database node with
> >> pcp_detach_node. This starts a failover, but during this failover,
>
> BTW, if you stop the primary database by actually stopping it (pg_ctl
> stop), then the problem does not happen? If my theory was correct, the
> problem will not occur because detach_false_primary will not trigger
> failover if there's only one primary.
>

Hi Tatsuo,

Thanks for your response. We also suspected a race condition between
the follow primary process and detach_false_primary. The workaround
with temporarily disabling detach_false_primary is rather complicated
for us to implement. I requires the configuration to be reloaded on
all nodes and we need to make sure it is enabled afterwards. At the
moment we have no way to change the configuration of pgpool on the
fly.

We decided to change the switchover process to shutting down the
primary database, letting the pgpool cluster handle the failover
scenario and then executing the follow primary steps on the former
primary node directly from the switchover script. This seems to work
reliably. So I can confirm that the problem does not occur when you
shut down the primary database.

>From what I've observed, I think the problem is:
Starting with 3 backends, node 0 is primary, node 1 and 2 are following node 0.
Node 0 is detached, starting a failover
Node 1 is selected as the new primary and promoted
The new situation now is:
Node 0 is back online and primary, node 1 is primary and node 2 is
still following node 0.
This causes the detach_false_primary to think that node 1 is a false
primary (it has no slaves while node 0 has 1)
It detaches node 1, and all goes wrong.

Best regards,
Emond