[pgpool-hackers: 4049] Re: [pgpool-general: 7543] VIP with one node

Fri Oct 29 23:52:17 JST 2021

Hi,

So I have cooked up a WIP patch that implements the above discussed
behavior.

The attached patch adds three new configuration parameters

#wd_remove_shutdown_nodes = off
                                    # when enabled properly shutdown
watchdog nodes get
                                    # removed from the cluster and does not
count towards
                                    # the quorum and consensus computations

#wd_lost_node_removal_timeout = 0s
                                    # Time after which the LOST watchdog
nodes get
                                    # removed from the cluster and does not
count towards
                                    # the quorum and consensus computations
                                    # setting it to 0 will never remove the
LOST nodes

#wd_initial_node_showup_time = 0s

                                    # Time to wait for Watchdog nodes to
connect to the cluster.
                                    # After that time the nodes are
considered to be not part of
                                    # the cluster and will not count towards
                                    # the quorum and consensus computations
                                    # setting it to 0 will wait forever

Keeping the default values for these parameters retains the existing
behavior.

Moreover, the patch also enhances the wd_watchdog_info utility to output
the current "Quorum State"

for each watchdog node and "number of nodes require for quorum" and "valid
remote nodes count" as per

the current status of watchdog cluster. This change might also require the
bump of pcp lib version.

bin/pcp_watchdog_info -U postgres -v
Watchdog Cluster Information
Total Nodes              : 3
Remote Nodes             : 2

*Valid Remote Nodes       : 1*Alive Remote Nodes       : 0

*Nodes required for quorum: 2*Quorum state             : QUORUM ABSENT
VIP up on local node     : NO
Leader Node Name         : localhost:9990 Darwin Usama-Macbook-Pro.local
Leader Host Name         : localhost

Watchdog Node Information
Node Name      : localhost:9990 Darwin Usama-Macbook-Pro.local
...
Status Name    : LEADER

*Quorum State   : ACTIVE*
Node Name      : localhost:9991 Darwin Usama-Macbook-Pro.local
...
Status         : 10
Status Name    : SHUTDOWN
*Quorum State   : ACTIVE*

Node Name      : Not_Set
...
Status Name    : DEAD

*Quorum State   : REMOVED-NO-SHOW*

The patch is still in WIP state mainly because it lacks the documentation
updates, and I am
sharing it to get an opinion and suggestions on the behavior and
configuration parameter names.

Thanks
Best regards
Muhammad Usama

On Mon, Aug 23, 2021 at 6:05 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> Sorry for late reply.
>
> From: Muhammad Usama <m.usama at gmail.com>
> Subject: Re: [pgpool-hackers: 3898] Re: [pgpool-general: 7543] VIP with
> one node
> Date: Thu, 22 Jul 2021 14:12:59 +0500
> Message-ID: <
> CAEJvTzXsKE2B0QMd0AjGBmXK6zocWZZcGU7yzzkSnmff0iAfqA at mail.gmail.com>
>
> > On Tue, Jul 20, 2021 at 4:40 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> >> Is it possible to configure watchdog to enable the lost node removal
> >> >> function only when a node is properly shutdown?
> >> >>
> >>
> >> > Yes if we disable the wd_lost_node_to_remove_timeout (by setting it
> to 0)
> >> > the lost node removal will only happen for properly shutdown nodes.
> >>
> >> Oh, I thought setting wd_lost_node_to_remove_timeout to 0 will keep
> >> the existing behavior.
> >>
> >
> > As there are two parts of the proposal, First one deals with removing the
> > lost node
> > from the cluster after wd_lost_node_to_remove_timeout amount of time.
> While
> > the
> > second part is about removing the properly shutdown nodes from the
> cluster.
> >
> > Now disabling the wd_lost_node_to_remove_timeout (setting it to 0) will
> > keep the
> > existing behaviour as far as removing the lost node portion of proposal
> is
> > concerned.
> >
> > While not counting the properly shutdown node as part of watchdog cluster
> > is not configurable (as per original proposal), So if we want to make
> this
> > part configurable
> > as well so that we can switch to 100% current behaviour then we can add
> > another
> > config parameter for that. like
> consider_shutdown_nodes_part_of_wd_cluster
> > = [on|off]
>
> +1 to add the new parameter.
>
> The reason is, some users may want to avoid split brain problem even
> if quorum/VIP is lost.  Suppose there are two admins A for the system
> (OS), B for the database. B never wants to have the split brain
> possibility. If A shutdowns the system, B may not notice there are not
> enough nodes to form consensus anymore because if
> consider_shutdown_nodes_part_of_wd_cluster is on because the
> quorum/VIP will be kept until no node remains.
>
> In summary I think there are two use-cases for both
> consider_shutdown_nodes_part_of_wd_cluster is on and off.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20211029/fe4d5af9/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vip_with_single_node.diff
Type: application/octet-stream
Size: 31676 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20211029/fe4d5af9/attachment-0001.obj>