[pgpool-hackers: 4055] Re: [pgpool-general: 7543] VIP with one node

Mon Nov 8 02:13:22 JST 2021

Hi Ishii-San,

On Tue, Nov 2, 2021 at 5:58 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> I confirmed you patch works as expected. Thank you for your great work!
>

Many thanks for the confirmation. I have made a few cosmetic changes and
committed the patch and documentation update.

Best Regards
Muhammad Usama

> > Hi Tatsuo,
> >
> > On Mon, Nov 1, 2021 at 12:21 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> Hi Usama,
> >>
> >> Thank you for the patch. Unfortunately the patch does not apply to the
> >> master branch anymore. Can you please rebase it?
> >>
> >
> > Please find the rebased patch
> >
> > Thanks
> > Best regards
> > Muhammad Usama
> >
> >
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> > Hi,
> >> >
> >> > So I have cooked up a WIP patch that implements the above discussed
> >> > behavior.
> >> >
> >> > The attached patch adds three new configuration parameters
> >> >
> >> > #wd_remove_shutdown_nodes = off
> >> >                                     # when enabled properly shutdown
> >> > watchdog nodes get
> >> >                                     # removed from the cluster and
> does
> >> not
> >> > count towards
> >> >                                     # the quorum and consensus
> >> computations
> >> >
> >> > #wd_lost_node_removal_timeout = 0s
> >> >                                     # Time after which the LOST
> watchdog
> >> > nodes get
> >> >                                     # removed from the cluster and
> does
> >> not
> >> > count towards
> >> >                                     # the quorum and consensus
> >> computations
> >> >                                     # setting it to 0 will never
> remove
> >> the
> >> > LOST nodes
> >> >
> >> > #wd_initial_node_showup_time = 0s
> >> >
> >> >                                     # Time to wait for Watchdog nodes
> to
> >> > connect to the cluster.
> >> >                                     # After that time the nodes are
> >> > considered to be not part of
> >> >                                     # the cluster and will not count
> >> towards
> >> >                                     # the quorum and consensus
> >> computations
> >> >                                     # setting it to 0 will wait
> forever
> >> >
> >> >
> >> > Keeping the default values for these parameters retains the existing
> >> > behavior.
> >> >
> >> >
> >> > Moreover, the patch also enhances the wd_watchdog_info utility to
> output
> >> > the current "Quorum State"
> >> >
> >> > for each watchdog node and "number of nodes require for quorum" and
> >> "valid
> >> > remote nodes count" as per
> >> >
> >> > the current status of watchdog cluster. This change might also require
> >> the
> >> > bump of pcp lib version.
> >> >
> >> >
> >> >
> >> > bin/pcp_watchdog_info -U postgres -v
> >> > Watchdog Cluster Information
> >> > Total Nodes              : 3
> >> > Remote Nodes             : 2
> >> >
> >> > *Valid Remote Nodes       : 1*Alive Remote Nodes       : 0
> >> >
> >> > *Nodes required for quorum: 2*Quorum state             : QUORUM ABSENT
> >> > VIP up on local node     : NO
> >> > Leader Node Name         : localhost:9990 Darwin
> Usama-Macbook-Pro.local
> >> > Leader Host Name         : localhost
> >> >
> >> > Watchdog Node Information
> >> > Node Name      : localhost:9990 Darwin Usama-Macbook-Pro.local
> >> > ...
> >> > Status Name    : LEADER
> >> >
> >> > *Quorum State   : ACTIVE*
> >> > Node Name      : localhost:9991 Darwin Usama-Macbook-Pro.local
> >> > ...
> >> > Status         : 10
> >> > Status Name    : SHUTDOWN
> >> > *Quorum State   : ACTIVE*
> >> >
> >> > Node Name      : Not_Set
> >> > ...
> >> > Status Name    : DEAD
> >> >
> >> > *Quorum State   : REMOVED-NO-SHOW*
> >> >
> >> > The patch is still in WIP state mainly because it lacks the
> documentation
> >> > updates, and I am
> >> > sharing it to get an opinion and suggestions on the behavior and
> >> > configuration parameter names.
> >> >
> >> > Thanks
> >> > Best regards
> >> > Muhammad Usama
> >> >
> >> >
> >> > On Mon, Aug 23, 2021 at 6:05 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >> >
> >> >> Hi Usama,
> >> >>
> >> >> Sorry for late reply.
> >> >>
> >> >> From: Muhammad Usama <m.usama at gmail.com>
> >> >> Subject: Re: [pgpool-hackers: 3898] Re: [pgpool-general: 7543] VIP
> with
> >> >> one node
> >> >> Date: Thu, 22 Jul 2021 14:12:59 +0500
> >> >> Message-ID: <
> >> >> CAEJvTzXsKE2B0QMd0AjGBmXK6zocWZZcGU7yzzkSnmff0iAfqA at mail.gmail.com>
> >> >>
> >> >> > On Tue, Jul 20, 2021 at 4:40 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> >> wrote:
> >> >> >
> >> >> >> >> Is it possible to configure watchdog to enable the lost node
> >> removal
> >> >> >> >> function only when a node is properly shutdown?
> >> >> >> >>
> >> >> >>
> >> >> >> > Yes if we disable the wd_lost_node_to_remove_timeout (by
> setting it
> >> >> to 0)
> >> >> >> > the lost node removal will only happen for properly shutdown
> nodes.
> >> >> >>
> >> >> >> Oh, I thought setting wd_lost_node_to_remove_timeout to 0 will
> keep
> >> >> >> the existing behavior.
> >> >> >>
> >> >> >
> >> >> > As there are two parts of the proposal, First one deals with
> removing
> >> the
> >> >> > lost node
> >> >> > from the cluster after wd_lost_node_to_remove_timeout amount of
> time.
> >> >> While
> >> >> > the
> >> >> > second part is about removing the properly shutdown nodes from the
> >> >> cluster.
> >> >> >
> >> >> > Now disabling the wd_lost_node_to_remove_timeout (setting it to 0)
> >> will
> >> >> > keep the
> >> >> > existing behaviour as far as removing the lost node portion of
> >> proposal
> >> >> is
> >> >> > concerned.
> >> >> >
> >> >> > While not counting the properly shutdown node as part of watchdog
> >> cluster
> >> >> > is not configurable (as per original proposal), So if we want to
> make
> >> >> this
> >> >> > part configurable
> >> >> > as well so that we can switch to 100% current behaviour then we can
> >> add
> >> >> > another
> >> >> > config parameter for that. like
> >> >> consider_shutdown_nodes_part_of_wd_cluster
> >> >> > = [on|off]
> >> >>
> >> >> +1 to add the new parameter.
> >> >>
> >> >> The reason is, some users may want to avoid split brain problem even
> >> >> if quorum/VIP is lost.  Suppose there are two admins A for the system
> >> >> (OS), B for the database. B never wants to have the split brain
> >> >> possibility. If A shutdowns the system, B may not notice there are
> not
> >> >> enough nodes to form consensus anymore because if
> >> >> consider_shutdown_nodes_part_of_wd_cluster is on because the
> >> >> quorum/VIP will be kept until no node remains.
> >> >>
> >> >> In summary I think there are two use-cases for both
> >> >> consider_shutdown_nodes_part_of_wd_cluster is on and off.
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS, Inc. Japan
> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> Japanese:http://www.sraoss.co.jp
> >> >>
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20211107/3751ae97/attachment.htm>