[pgpool-hackers: 4433] Re: detach_false_primary could make all nodes go down

Tatsuo Ishii ishii at sraoss.co.jp
Thu Feb 22 15:54:08 JST 2024


> Hi Usama,
> 
>> It is reported that detach_false_primary could make all nodes go down.
>> 
>> To reproduce the issue, we need to enable watchdog.
>> Steps to reproduce.
>> 
>> Prerequisites
>> - There are 3 watchdog nodes pgpool0, pgpool1 and pgpool2.
>> - There are 2 DB nodes node0 and node1 (initially node 0 is primary).
>> - follow_primary_command is disabled.
>> 
>> Steps to reproduce:
>> 1) Node 0 goes down at pgpool0 due to a network trouble. BUT actually
>>    node 0 is alive.
>> 
>> 2) Node 0 goes down at pgpool1 due to a network trouble. BUT actually
>>    node 0 is alive.
>> 
>> 3) Failover is triggered. Since pgpool0 and pgpool1 agree, node 0 is
>>    set to down.  node 1 is promoted.
>> 
>> 4) Before new status is synced with pgpool2, pgpool2's sr_check finds
>>    that there are two primary nodes due to #3. detach_false_primary
>>    is triggered and node 1 goes down.
>> 
>> 5) Now all backends are in down status.
>> 
>> I think until the new node status is not synced with watchdog leader,
>> sr_check should not trigger detach_false_primary because if node 0 is
>> down status in pgpool2, sr_chec will ignore node 0 and
>> detach_false_primary will never be triggered . But I don't know how to
>> implement it.
>> 
>> Any idea?
> 
> One of the ideas is, performing detach_false_primary only on the
> watchdog neader node if watchdog is enabled. In the leader node, I
> think there's no window between #3 and #4 and detach_false_primary
> will skip node 0: pgpool will not regard node 1 as a false primary.
> 
> For additional protection, maybe detach_false_primary should only run
> if quorum exists.

I have implemented this. Now detach_false_primary only detaches false
primary only if one of followings is true:

- watchdog is not enabled.

- watchdog is enabled and quorum exists and leader watchdog node
  detected false primary.

See attached patch for more details. However, with this patch even if
failover_require_consensus is on, detach_false_primary does not
require consensus from other watchdog node.  Can we accept this?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: detach_false_primary_all_down.patch
Type: text/x-patch
Size: 3323 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240222/9dc2d0a5/attachment.bin>


More information about the pgpool-hackers mailing list