[pgpool-general: 7529] Re: Strange behavior on switchover with detach_false_primary enabled

Tatsuo Ishii ishii at sraoss.co.jp
Wed Apr 28 15:43:22 JST 2021


Hi Emond,

> Hi Tatsuo,
> 
> Thanks for your response. We also suspected a race condition between
> the follow primary process and detach_false_primary. The workaround
> with temporarily disabling detach_false_primary is rather complicated
> for us to implement. I requires the configuration to be reloaded on
> all nodes and we need to make sure it is enabled afterwards. At the
> moment we have no way to change the configuration of pgpool on the
> fly.
> 
> We decided to change the switchover process to shutting down the
> primary database, letting the pgpool cluster handle the failover
> scenario and then executing the follow primary steps on the former
> primary node directly from the switchover script. This seems to work
> reliably. So I can confirm that the problem does not occur when you
> shut down the primary database.
> 
>>From what I've observed, I think the problem is:
> Starting with 3 backends, node 0 is primary, node 1 and 2 are following node 0.
> Node 0 is detached, starting a failover
> Node 1 is selected as the new primary and promoted
> The new situation now is:
> Node 0 is back online and primary, node 1 is primary and node 2 is
> still following node 0.
> This causes the detach_false_primary to think that node 1 is a false
> primary (it has no slaves while node 0 has 1)
> It detaches node 1, and all goes wrong.

I think your analysis is correct. To overcome the issue,
detach_false_primary should not execute while failover and/or follow
primary process is running (detach_false_primary is kicked by
streaming replication delay process. So even if detach_false_primary
is missed at some point, next round of the streaming replication check
will execute detach_false_primary if there's no failover and/or follow
primary is running).

Currently there's information about whether failover is going on, but
there's no info on whether follow primary is running. I will study if
we could add info about follow primary execution. This would be a
little bit tricky since follow primary runs as separate process
from the pgpool main process, which conducts failover.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-general mailing list