[pgpool-hackers: 4432] Re: detach_false_primary could make all nodes go down

Tatsuo Ishii ishii at sraoss.co.jp
Wed Feb 21 15:50:07 JST 2024


Hi Usama,

> It is reported that detach_false_primary could make all nodes go down.
> 
> To reproduce the issue, we need to enable watchdog.
> Steps to reproduce.
> 
> Prerequisites
> - There are 3 watchdog nodes pgpool0, pgpool1 and pgpool2.
> - There are 2 DB nodes node0 and node1 (initially node 0 is primary).
> - follow_primary_command is disabled.
> 
> Steps to reproduce:
> 1) Node 0 goes down at pgpool0 due to a network trouble. BUT actually
>    node 0 is alive.
> 
> 2) Node 0 goes down at pgpool1 due to a network trouble. BUT actually
>    node 0 is alive.
> 
> 3) Failover is triggered. Since pgpool0 and pgpool1 agree, node 0 is
>    set to down.  node 1 is promoted.
> 
> 4) Before new status is synced with pgpool2, pgpool2's sr_check finds
>    that there are two primary nodes due to #3. detach_false_primary
>    is triggered and node 1 goes down.
> 
> 5) Now all backends are in down status.
> 
> I think until the new node status is not synced with watchdog leader,
> sr_check should not trigger detach_false_primary because if node 0 is
> down status in pgpool2, sr_chec will ignore node 0 and
> detach_false_primary will never be triggered . But I don't know how to
> implement it.
> 
> Any idea?

One of the ideas is, performing detach_false_primary only on the
watchdog neader node if watchdog is enabled. In the leader node, I
think there's no window between #3 and #4 and detach_false_primary
will skip node 0: pgpool will not regard node 1 as a false primary.

For additional protection, maybe detach_false_primary should only run
if quorum exists.

Usama, what do you think?
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list