[pgpool-hackers: 4054] Re: [pgpool-general: 7543] VIP with one node

Tatsuo Ishii ishii at sraoss.co.jp
Tue Nov 2 09:58:26 JST 2021


Hi Usama,

I confirmed you patch works as expected. Thank you for your great work!

> Hi Tatsuo,
> 
> On Mon, Nov 1, 2021 at 12:21 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Hi Usama,
>>
>> Thank you for the patch. Unfortunately the patch does not apply to the
>> master branch anymore. Can you please rebase it?
>>
> 
> Please find the rebased patch
> 
> Thanks
> Best regards
> Muhammad Usama
> 
> 
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > Hi,
>> >
>> > So I have cooked up a WIP patch that implements the above discussed
>> > behavior.
>> >
>> > The attached patch adds three new configuration parameters
>> >
>> > #wd_remove_shutdown_nodes = off
>> >                                     # when enabled properly shutdown
>> > watchdog nodes get
>> >                                     # removed from the cluster and does
>> not
>> > count towards
>> >                                     # the quorum and consensus
>> computations
>> >
>> > #wd_lost_node_removal_timeout = 0s
>> >                                     # Time after which the LOST watchdog
>> > nodes get
>> >                                     # removed from the cluster and does
>> not
>> > count towards
>> >                                     # the quorum and consensus
>> computations
>> >                                     # setting it to 0 will never remove
>> the
>> > LOST nodes
>> >
>> > #wd_initial_node_showup_time = 0s
>> >
>> >                                     # Time to wait for Watchdog nodes to
>> > connect to the cluster.
>> >                                     # After that time the nodes are
>> > considered to be not part of
>> >                                     # the cluster and will not count
>> towards
>> >                                     # the quorum and consensus
>> computations
>> >                                     # setting it to 0 will wait forever
>> >
>> >
>> > Keeping the default values for these parameters retains the existing
>> > behavior.
>> >
>> >
>> > Moreover, the patch also enhances the wd_watchdog_info utility to output
>> > the current "Quorum State"
>> >
>> > for each watchdog node and "number of nodes require for quorum" and
>> "valid
>> > remote nodes count" as per
>> >
>> > the current status of watchdog cluster. This change might also require
>> the
>> > bump of pcp lib version.
>> >
>> >
>> >
>> > bin/pcp_watchdog_info -U postgres -v
>> > Watchdog Cluster Information
>> > Total Nodes              : 3
>> > Remote Nodes             : 2
>> >
>> > *Valid Remote Nodes       : 1*Alive Remote Nodes       : 0
>> >
>> > *Nodes required for quorum: 2*Quorum state             : QUORUM ABSENT
>> > VIP up on local node     : NO
>> > Leader Node Name         : localhost:9990 Darwin Usama-Macbook-Pro.local
>> > Leader Host Name         : localhost
>> >
>> > Watchdog Node Information
>> > Node Name      : localhost:9990 Darwin Usama-Macbook-Pro.local
>> > ...
>> > Status Name    : LEADER
>> >
>> > *Quorum State   : ACTIVE*
>> > Node Name      : localhost:9991 Darwin Usama-Macbook-Pro.local
>> > ...
>> > Status         : 10
>> > Status Name    : SHUTDOWN
>> > *Quorum State   : ACTIVE*
>> >
>> > Node Name      : Not_Set
>> > ...
>> > Status Name    : DEAD
>> >
>> > *Quorum State   : REMOVED-NO-SHOW*
>> >
>> > The patch is still in WIP state mainly because it lacks the documentation
>> > updates, and I am
>> > sharing it to get an opinion and suggestions on the behavior and
>> > configuration parameter names.
>> >
>> > Thanks
>> > Best regards
>> > Muhammad Usama
>> >
>> >
>> > On Mon, Aug 23, 2021 at 6:05 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> >
>> >> Hi Usama,
>> >>
>> >> Sorry for late reply.
>> >>
>> >> From: Muhammad Usama <m.usama at gmail.com>
>> >> Subject: Re: [pgpool-hackers: 3898] Re: [pgpool-general: 7543] VIP with
>> >> one node
>> >> Date: Thu, 22 Jul 2021 14:12:59 +0500
>> >> Message-ID: <
>> >> CAEJvTzXsKE2B0QMd0AjGBmXK6zocWZZcGU7yzzkSnmff0iAfqA at mail.gmail.com>
>> >>
>> >> > On Tue, Jul 20, 2021 at 4:40 AM Tatsuo Ishii <ishii at sraoss.co.jp>
>> wrote:
>> >> >
>> >> >> >> Is it possible to configure watchdog to enable the lost node
>> removal
>> >> >> >> function only when a node is properly shutdown?
>> >> >> >>
>> >> >>
>> >> >> > Yes if we disable the wd_lost_node_to_remove_timeout (by setting it
>> >> to 0)
>> >> >> > the lost node removal will only happen for properly shutdown nodes.
>> >> >>
>> >> >> Oh, I thought setting wd_lost_node_to_remove_timeout to 0 will keep
>> >> >> the existing behavior.
>> >> >>
>> >> >
>> >> > As there are two parts of the proposal, First one deals with removing
>> the
>> >> > lost node
>> >> > from the cluster after wd_lost_node_to_remove_timeout amount of time.
>> >> While
>> >> > the
>> >> > second part is about removing the properly shutdown nodes from the
>> >> cluster.
>> >> >
>> >> > Now disabling the wd_lost_node_to_remove_timeout (setting it to 0)
>> will
>> >> > keep the
>> >> > existing behaviour as far as removing the lost node portion of
>> proposal
>> >> is
>> >> > concerned.
>> >> >
>> >> > While not counting the properly shutdown node as part of watchdog
>> cluster
>> >> > is not configurable (as per original proposal), So if we want to make
>> >> this
>> >> > part configurable
>> >> > as well so that we can switch to 100% current behaviour then we can
>> add
>> >> > another
>> >> > config parameter for that. like
>> >> consider_shutdown_nodes_part_of_wd_cluster
>> >> > = [on|off]
>> >>
>> >> +1 to add the new parameter.
>> >>
>> >> The reason is, some users may want to avoid split brain problem even
>> >> if quorum/VIP is lost.  Suppose there are two admins A for the system
>> >> (OS), B for the database. B never wants to have the split brain
>> >> possibility. If A shutdowns the system, B may not notice there are not
>> >> enough nodes to form consensus anymore because if
>> >> consider_shutdown_nodes_part_of_wd_cluster is on because the
>> >> quorum/VIP will be kept until no node remains.
>> >>
>> >> In summary I think there are two use-cases for both
>> >> consider_shutdown_nodes_part_of_wd_cluster is on and off.
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>> >>
>>


More information about the pgpool-hackers mailing list