[pgpool-hackers: 3398] Re: Failover consensus on even number of nodes

Tatsuo Ishii ishii at sraoss.co.jp
Wed Aug 28 17:22:05 JST 2019


From: Tatsuo Ishii <ishii at sraoss.co.jp>
Subject: [pgpool-hackers: 3396] Re: Failover consensus on even number of nodes
Date: Tue, 27 Aug 2019 11:11:51 +0900 (JST)
Message-ID: <20190827.111151.2130894466144469209.t-ishii at sraoss.co.jp>

>>>>> Hi Ishii-San,
>>>>> 
>>>>> On Sat, Aug 17, 2019 at 1:00 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>> 
>>>>>> > Hi Ishii-San
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Aug 15, 2019 at 11:42 AM Tatsuo Ishii <ishii at sraoss.co.jp>
>>>>>> wrote:
>>>>>> >
>>>>>> >> Hi Usama,
>>>>>> >>
>>>>>> >> When number of Pgpool-II nodes is even, it seems consensus based
>>>>>> >> failover occurs if n/2 Pgpool-II agrees on the failure. For example,
>>>>>> >> if there are 4 nodes of Pgpool-II, 2 nodes agree on the failure,
>>>>>> >> failover occurs. Is there any reason behind this? I am asking because
>>>>>> >> it could easily lead to split brain, because 2 nodes could agree on
>>>>>> >> the failover while other 2 nodes disagree. Actually other HA software,
>>>>>> >> for example etcd, requires n/2+1 vote to gain consensus.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-is-failure-tolerance
>>>>>> >>
>>>>>> >> With n/2+1 vote requirements, there's no possibility of split brain.
>>>>>> >>
>>>>>> >>
>>>>>> > Yes, your observation is spot on. The original motivation to consider the
>>>>>> > exact n/2 votes for consensus rather (n/2 +1)
>>>>>> > was to ensure the working of 2 node Pgpool-II clusters.
>>>>>> > My understanding was that most of the users use 2 Pgpool-II nodes in
>>>>>> their
>>>>>> > setup, so I wanted
>>>>>> > to make sure that in the case when one of the Pgpool-II nodes goes down (
>>>>>> > In 2 node) cluster the consensus
>>>>>> > should still be possible.
>>>>>> > But your point is also valid that makes the system prone to split-brain.
>>>>>> So
>>>>>> > what are your suggestions on that?
>>>>>> > I think we can introduce a new configuration parameter to enable/disable
>>>>>> > n/2 node consensus.
>>>>>>
>>>>>> If my understanding is correct, current behavior for 2 node Pgpool-II
>>>>>> clusters there's no difference whether failover_when_quorum_exists is
>>>>>> on or off. That means for 2 node Pgpool-II clusters even if we change
>>>>>> n/2 node consensus to n/2+1 consensus, 2 node users could keep the
>>>>>> existing behavior by turning off failover_when_quorum_exists. If this
>>>>>> is correct, we don't need to introduce the new switch for 4.1, just
>>>>>> change n/2 node consensus to n/2+1 consensus. What do you think?
>>>>>>
>>>>> 
>>>>> Yes, that's true, turning off the failover_when_quorum_exists will
>>>>> effectively give us the
>>>>> same behaviour for 2 nodes cluster.
>>>>> 
>>>>> 
>>>>>> The only concern is 4 node Pgpool-II clusters. I doubt there's 4 node
>>>>>> users in the field though.
>>>>>>
>>>>> 
>>>>> Yes, you are right there wouldn't be many users who would deploy 4 nodes
>>>>> cluster. But somehow we need
>>>>> to keep the behaviour and configurations consistent for all possible
>>>>> scenarios.
>>>>> 
>>>>> Also, the decision of considering either n/2 or (n/2 +1) as a valid
>>>>> consensus for voting is not only limited to
>>>>> the backend node failover. Pgpool-II also considers the valid consensus
>>>>> with n/2 votes when deciding the
>>>>> watchdog master. And currently, the behaviour of watchdog master elections
>>>>> and backend node failover consensus
>>>>> building is consistent. So If we want to revisit this we might need to
>>>>> consider the behaviour in both cases.
>>>> 
>>>> Ok, it seems creating new parameter for switching n/2 or n/2+1 could
>>>> be safer, I agree. Usama, would like to implement this for 4.1?
>>> 
>>> Attached is a proof of concept patch. GUC and doc change are not
>>> included. With the patch, 2 watchdog node cluster will go into "quorum
>>> absent" state if one the nodes goes down.
>> 
>> Attached is ready for review patch. GUC and English manual included.
> 
> In additon, attached is a patch against 004.watchdog test. Without
> this, the test fails.

If there's no objection, I will commit/push tomorrow.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list