[pgpool-hackers: 4052] Re: [pgpool-general: 7543] VIP with one node

Mon Nov 1 16:20:55 JST 2021

Hi Usama,

Thank you for the patch. Unfortunately the patch does not apply to the
master branch anymore. Can you please rebase it?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi,
> 
> So I have cooked up a WIP patch that implements the above discussed
> behavior.
> 
> The attached patch adds three new configuration parameters
> 
> #wd_remove_shutdown_nodes = off
>                                     # when enabled properly shutdown
> watchdog nodes get
>                                     # removed from the cluster and does not
> count towards
>                                     # the quorum and consensus computations
> 
> #wd_lost_node_removal_timeout = 0s
>                                     # Time after which the LOST watchdog
> nodes get
>                                     # removed from the cluster and does not
> count towards
>                                     # the quorum and consensus computations
>                                     # setting it to 0 will never remove the
> LOST nodes
> 
> #wd_initial_node_showup_time = 0s
> 
>                                     # Time to wait for Watchdog nodes to
> connect to the cluster.
>                                     # After that time the nodes are
> considered to be not part of
>                                     # the cluster and will not count towards
>                                     # the quorum and consensus computations
>                                     # setting it to 0 will wait forever
> 
> 
> Keeping the default values for these parameters retains the existing
> behavior.
> 
> 
> Moreover, the patch also enhances the wd_watchdog_info utility to output
> the current "Quorum State"
> 
> for each watchdog node and "number of nodes require for quorum" and "valid
> remote nodes count" as per
> 
> the current status of watchdog cluster. This change might also require the
> bump of pcp lib version.
> 
> 
> 
> bin/pcp_watchdog_info -U postgres -v
> Watchdog Cluster Information
> Total Nodes              : 3
> Remote Nodes             : 2
> 
> *Valid Remote Nodes       : 1*Alive Remote Nodes       : 0
> 
> *Nodes required for quorum: 2*Quorum state             : QUORUM ABSENT
> VIP up on local node     : NO
> Leader Node Name         : localhost:9990 Darwin Usama-Macbook-Pro.local
> Leader Host Name         : localhost
> 
> Watchdog Node Information
> Node Name      : localhost:9990 Darwin Usama-Macbook-Pro.local
> ...
> Status Name    : LEADER
> 
> *Quorum State   : ACTIVE*
> Node Name      : localhost:9991 Darwin Usama-Macbook-Pro.local
> ...
> Status         : 10
> Status Name    : SHUTDOWN
> *Quorum State   : ACTIVE*
> 
> Node Name      : Not_Set
> ...
> Status Name    : DEAD
> 
> *Quorum State   : REMOVED-NO-SHOW*
> 
> The patch is still in WIP state mainly because it lacks the documentation
> updates, and I am
> sharing it to get an opinion and suggestions on the behavior and
> configuration parameter names.
> 
> Thanks
> Best regards
> Muhammad Usama
> 
> 
> On Mon, Aug 23, 2021 at 6:05 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Hi Usama,
>>
>> Sorry for late reply.
>>
>> From: Muhammad Usama <m.usama at gmail.com>
>> Subject: Re: [pgpool-hackers: 3898] Re: [pgpool-general: 7543] VIP with
>> one node
>> Date: Thu, 22 Jul 2021 14:12:59 +0500
>> Message-ID: <
>> CAEJvTzXsKE2B0QMd0AjGBmXK6zocWZZcGU7yzzkSnmff0iAfqA at mail.gmail.com>
>>
>> > On Tue, Jul 20, 2021 at 4:40 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> >
>> >> >> Is it possible to configure watchdog to enable the lost node removal
>> >> >> function only when a node is properly shutdown?
>> >> >>
>> >>
>> >> > Yes if we disable the wd_lost_node_to_remove_timeout (by setting it
>> to 0)
>> >> > the lost node removal will only happen for properly shutdown nodes.
>> >>
>> >> Oh, I thought setting wd_lost_node_to_remove_timeout to 0 will keep
>> >> the existing behavior.
>> >>
>> >
>> > As there are two parts of the proposal, First one deals with removing the
>> > lost node
>> > from the cluster after wd_lost_node_to_remove_timeout amount of time.
>> While
>> > the
>> > second part is about removing the properly shutdown nodes from the
>> cluster.
>> >
>> > Now disabling the wd_lost_node_to_remove_timeout (setting it to 0) will
>> > keep the
>> > existing behaviour as far as removing the lost node portion of proposal
>> is
>> > concerned.
>> >
>> > While not counting the properly shutdown node as part of watchdog cluster
>> > is not configurable (as per original proposal), So if we want to make
>> this
>> > part configurable
>> > as well so that we can switch to 100% current behaviour then we can add
>> > another
>> > config parameter for that. like
>> consider_shutdown_nodes_part_of_wd_cluster
>> > = [on|off]
>>
>> +1 to add the new parameter.
>>
>> The reason is, some users may want to avoid split brain problem even
>> if quorum/VIP is lost.  Suppose there are two admins A for the system
>> (OS), B for the database. B never wants to have the split brain
>> possibility. If A shutdowns the system, B may not notice there are not
>> enough nodes to form consensus anymore because if
>> consider_shutdown_nodes_part_of_wd_cluster is on because the
>> quorum/VIP will be kept until no node remains.
>>
>> In summary I think there are two use-cases for both
>> consider_shutdown_nodes_part_of_wd_cluster is on and off.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>