[pgpool-hackers: 4058] Re: [pgpool-general: 7543] VIP with one node

Mon Nov 8 15:43:23 JST 2021

Hi Ishii-San,

On Mon, Nov 8, 2021 at 7:33 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> Thank you for the work. I have added it to the Pgpool-II 4.3 release
> note.  Can you please take a look at to check if I have
> misunderstood anything regarding the commit.
>

Overall it looks good. I have tried to rephrase it a tiny bit. Can you see
the attached
release_notes.diff if it looks ok to you?

>
> One thing I noticed in the doc you added is, you did not mention about
> the risk of split-brain because of this feature enabled. Should we add
> that?
>
>
You are absolutely right. We should definitely mention the risks.

What do you think about this below caution message for documentation?

--quote--
Using the dynamic cluster membership has an associated risk of causing a
split-brain.
So it is strongly recommended to carefully consider if the setup requires
the dynamic cluster
membership and try to use conservative values for related settings.
--un-quote--

If you agree with the above message. I will put that in the respective
documentation file.

Thanks
Best regards
Muhammad Usama

> > Hi Ishii-San,
> >
> > On Tue, Nov 2, 2021 at 5:58 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> Hi Usama,
> >>
> >> I confirmed you patch works as expected. Thank you for your great work!
> >>
> >
> > Many thanks for the confirmation. I have made a few cosmetic changes and
> > committed the patch and documentation update.
> >
> > Best Regards
> > Muhammad Usama
> >
> >
> >
> >> > Hi Tatsuo,
> >> >
> >> > On Mon, Nov 1, 2021 at 12:21 PM Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >> >
> >> >> Hi Usama,
> >> >>
> >> >> Thank you for the patch. Unfortunately the patch does not apply to
> the
> >> >> master branch anymore. Can you please rebase it?
> >> >>
> >> >
> >> > Please find the rebased patch
> >> >
> >> > Thanks
> >> > Best regards
> >> > Muhammad Usama
> >> >
> >> >
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS, Inc. Japan
> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> Japanese:http://www.sraoss.co.jp
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > So I have cooked up a WIP patch that implements the above discussed
> >> >> > behavior.
> >> >> >
> >> >> > The attached patch adds three new configuration parameters
> >> >> >
> >> >> > #wd_remove_shutdown_nodes = off
> >> >> >                                     # when enabled properly
> shutdown
> >> >> > watchdog nodes get
> >> >> >                                     # removed from the cluster and
> >> does
> >> >> not
> >> >> > count towards
> >> >> >                                     # the quorum and consensus
> >> >> computations
> >> >> >
> >> >> > #wd_lost_node_removal_timeout = 0s
> >> >> >                                     # Time after which the LOST
> >> watchdog
> >> >> > nodes get
> >> >> >                                     # removed from the cluster and
> >> does
> >> >> not
> >> >> > count towards
> >> >> >                                     # the quorum and consensus
> >> >> computations
> >> >> >                                     # setting it to 0 will never
> >> remove
> >> >> the
> >> >> > LOST nodes
> >> >> >
> >> >> > #wd_initial_node_showup_time = 0s
> >> >> >
> >> >> >                                     # Time to wait for Watchdog
> nodes
> >> to
> >> >> > connect to the cluster.
> >> >> >                                     # After that time the nodes are
> >> >> > considered to be not part of
> >> >> >                                     # the cluster and will not
> count
> >> >> towards
> >> >> >                                     # the quorum and consensus
> >> >> computations
> >> >> >                                     # setting it to 0 will wait
> >> forever
> >> >> >
> >> >> >
> >> >> > Keeping the default values for these parameters retains the
> existing
> >> >> > behavior.
> >> >> >
> >> >> >
> >> >> > Moreover, the patch also enhances the wd_watchdog_info utility to
> >> output
> >> >> > the current "Quorum State"
> >> >> >
> >> >> > for each watchdog node and "number of nodes require for quorum" and
> >> >> "valid
> >> >> > remote nodes count" as per
> >> >> >
> >> >> > the current status of watchdog cluster. This change might also
> require
> >> >> the
> >> >> > bump of pcp lib version.
> >> >> >
> >> >> >
> >> >> >
> >> >> > bin/pcp_watchdog_info -U postgres -v
> >> >> > Watchdog Cluster Information
> >> >> > Total Nodes              : 3
> >> >> > Remote Nodes             : 2
> >> >> >
> >> >> > *Valid Remote Nodes       : 1*Alive Remote Nodes       : 0
> >> >> >
> >> >> > *Nodes required for quorum: 2*Quorum state             : QUORUM
> ABSENT
> >> >> > VIP up on local node     : NO
> >> >> > Leader Node Name         : localhost:9990 Darwin
> >> Usama-Macbook-Pro.local
> >> >> > Leader Host Name         : localhost
> >> >> >
> >> >> > Watchdog Node Information
> >> >> > Node Name      : localhost:9990 Darwin Usama-Macbook-Pro.local
> >> >> > ...
> >> >> > Status Name    : LEADER
> >> >> >
> >> >> > *Quorum State   : ACTIVE*
> >> >> > Node Name      : localhost:9991 Darwin Usama-Macbook-Pro.local
> >> >> > ...
> >> >> > Status         : 10
> >> >> > Status Name    : SHUTDOWN
> >> >> > *Quorum State   : ACTIVE*
> >> >> >
> >> >> > Node Name      : Not_Set
> >> >> > ...
> >> >> > Status Name    : DEAD
> >> >> >
> >> >> > *Quorum State   : REMOVED-NO-SHOW*
> >> >> >
> >> >> > The patch is still in WIP state mainly because it lacks the
> >> documentation
> >> >> > updates, and I am
> >> >> > sharing it to get an opinion and suggestions on the behavior and
> >> >> > configuration parameter names.
> >> >> >
> >> >> > Thanks
> >> >> > Best regards
> >> >> > Muhammad Usama
> >> >> >
> >> >> >
> >> >> > On Mon, Aug 23, 2021 at 6:05 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> >> wrote:
> >> >> >
> >> >> >> Hi Usama,
> >> >> >>
> >> >> >> Sorry for late reply.
> >> >> >>
> >> >> >> From: Muhammad Usama <m.usama at gmail.com>
> >> >> >> Subject: Re: [pgpool-hackers: 3898] Re: [pgpool-general: 7543] VIP
> >> with
> >> >> >> one node
> >> >> >> Date: Thu, 22 Jul 2021 14:12:59 +0500
> >> >> >> Message-ID: <
> >> >> >>
> CAEJvTzXsKE2B0QMd0AjGBmXK6zocWZZcGU7yzzkSnmff0iAfqA at mail.gmail.com>
> >> >> >>
> >> >> >> > On Tue, Jul 20, 2021 at 4:40 AM Tatsuo Ishii <
> ishii at sraoss.co.jp>
> >> >> wrote:
> >> >> >> >
> >> >> >> >> >> Is it possible to configure watchdog to enable the lost node
> >> >> removal
> >> >> >> >> >> function only when a node is properly shutdown?
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >> > Yes if we disable the wd_lost_node_to_remove_timeout (by
> >> setting it
> >> >> >> to 0)
> >> >> >> >> > the lost node removal will only happen for properly shutdown
> >> nodes.
> >> >> >> >>
> >> >> >> >> Oh, I thought setting wd_lost_node_to_remove_timeout to 0 will
> >> keep
> >> >> >> >> the existing behavior.
> >> >> >> >>
> >> >> >> >
> >> >> >> > As there are two parts of the proposal, First one deals with
> >> removing
> >> >> the
> >> >> >> > lost node
> >> >> >> > from the cluster after wd_lost_node_to_remove_timeout amount of
> >> time.
> >> >> >> While
> >> >> >> > the
> >> >> >> > second part is about removing the properly shutdown nodes from
> the
> >> >> >> cluster.
> >> >> >> >
> >> >> >> > Now disabling the wd_lost_node_to_remove_timeout (setting it to
> 0)
> >> >> will
> >> >> >> > keep the
> >> >> >> > existing behaviour as far as removing the lost node portion of
> >> >> proposal
> >> >> >> is
> >> >> >> > concerned.
> >> >> >> >
> >> >> >> > While not counting the properly shutdown node as part of
> watchdog
> >> >> cluster
> >> >> >> > is not configurable (as per original proposal), So if we want to
> >> make
> >> >> >> this
> >> >> >> > part configurable
> >> >> >> > as well so that we can switch to 100% current behaviour then we
> can
> >> >> add
> >> >> >> > another
> >> >> >> > config parameter for that. like
> >> >> >> consider_shutdown_nodes_part_of_wd_cluster
> >> >> >> > = [on|off]
> >> >> >>
> >> >> >> +1 to add the new parameter.
> >> >> >>
> >> >> >> The reason is, some users may want to avoid split brain problem
> even
> >> >> >> if quorum/VIP is lost.  Suppose there are two admins A for the
> system
> >> >> >> (OS), B for the database. B never wants to have the split brain
> >> >> >> possibility. If A shutdowns the system, B may not notice there are
> >> not
> >> >> >> enough nodes to form consensus anymore because if
> >> >> >> consider_shutdown_nodes_part_of_wd_cluster is on because the
> >> >> >> quorum/VIP will be kept until no node remains.
> >> >> >>
> >> >> >> In summary I think there are two use-cases for both
> >> >> >> consider_shutdown_nodes_part_of_wd_cluster is on and off.
> >> >> >> --
> >> >> >> Tatsuo Ishii
> >> >> >> SRA OSS, Inc. Japan
> >> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> >> Japanese:http://www.sraoss.co.jp
> >> >> >>
> >> >> >>
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20211108/d914f251/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: release_notes.diff
Type: application/octet-stream
Size: 907 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20211108/d914f251/attachment-0001.obj>