<div dir="ltr"><div>Hi Ishii-San,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 2, 2021 at 5:58 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Usama,<br>

<br>

I confirmed you patch works as expected. Thank you for your great work!<br></blockquote><div><br></div><div>Many thanks for the confirmation. I have made a few cosmetic changes and committed the patch and documentation update.</div><div><br></div><div><span style="background-color:transparent">Best Regards</span><br></div><div>Muhammad Usama</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> Hi Tatsuo,<br>

> <br>

> On Mon, Nov 1, 2021 at 12:21 PM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br>

> <br>

>> Hi Usama,<br>

>><br>

>> Thank you for the patch. Unfortunately the patch does not apply to the<br>

>> master branch anymore. Can you please rebase it?<br>

>><br>

> <br>

> Please find the rebased patch<br>

> <br>

> Thanks<br>

> Best regards<br>

> Muhammad Usama<br>

> <br>

> <br>

>> --<br>

>> Tatsuo Ishii<br>

>> SRA OSS, Inc. Japan<br>

>> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

>><br>

>> > Hi,<br>

>> ><br>

>> > So I have cooked up a WIP patch that implements the above discussed<br>

>> > behavior.<br>

>> ><br>

>> > The attached patch adds three new configuration parameters<br>

>> ><br>

>> > #wd_remove_shutdown_nodes = off<br>

>> >                                     # when enabled properly shutdown<br>

>> > watchdog nodes get<br>

>> >                                     # removed from the cluster and does<br>

>> not<br>

>> > count towards<br>

>> >                                     # the quorum and consensus<br>

>> computations<br>

>> ><br>

>> > #wd_lost_node_removal_timeout = 0s<br>

>> >                                     # Time after which the LOST watchdog<br>

>> > nodes get<br>

>> >                                     # removed from the cluster and does<br>

>> not<br>

>> > count towards<br>

>> >                                     # the quorum and consensus<br>

>> computations<br>

>> >                                     # setting it to 0 will never remove<br>

>> the<br>

>> > LOST nodes<br>

>> ><br>

>> > #wd_initial_node_showup_time = 0s<br>

>> ><br>

>> >                                     # Time to wait for Watchdog nodes to<br>

>> > connect to the cluster.<br>

>> >                                     # After that time the nodes are<br>

>> > considered to be not part of<br>

>> >                                     # the cluster and will not count<br>

>> towards<br>

>> >                                     # the quorum and consensus<br>

>> computations<br>

>> >                                     # setting it to 0 will wait forever<br>

>> ><br>

>> ><br>

>> > Keeping the default values for these parameters retains the existing<br>

>> > behavior.<br>

>> ><br>

>> ><br>

>> > Moreover, the patch also enhances the wd_watchdog_info utility to output<br>

>> > the current "Quorum State"<br>

>> ><br>

>> > for each watchdog node and "number of nodes require for quorum" and<br>

>> "valid<br>

>> > remote nodes count" as per<br>

>> ><br>

>> > the current status of watchdog cluster. This change might also require<br>

>> the<br>

>> > bump of pcp lib version.<br>

>> ><br>

>> ><br>

>> ><br>

>> > bin/pcp_watchdog_info -U postgres -v<br>

>> > Watchdog Cluster Information<br>

>> > Total Nodes              : 3<br>

>> > Remote Nodes             : 2<br>

>> ><br>

>> > *Valid Remote Nodes       : 1*Alive Remote Nodes       : 0<br>

>> ><br>

>> > *Nodes required for quorum: 2*Quorum state             : QUORUM ABSENT<br>

>> > VIP up on local node     : NO<br>

>> > Leader Node Name         : localhost:9990 Darwin Usama-Macbook-Pro.local<br>

>> > Leader Host Name         : localhost<br>

>> ><br>

>> > Watchdog Node Information<br>

>> > Node Name      : localhost:9990 Darwin Usama-Macbook-Pro.local<br>

>> > ...<br>

>> > Status Name    : LEADER<br>

>> ><br>

>> > *Quorum State   : ACTIVE*<br>

>> > Node Name      : localhost:9991 Darwin Usama-Macbook-Pro.local<br>

>> > ...<br>

>> > Status         : 10<br>

>> > Status Name    : SHUTDOWN<br>

>> > *Quorum State   : ACTIVE*<br>

>> ><br>

>> > Node Name      : Not_Set<br>

>> > ...<br>

>> > Status Name    : DEAD<br>

>> ><br>

>> > *Quorum State   : REMOVED-NO-SHOW*<br>

>> ><br>

>> > The patch is still in WIP state mainly because it lacks the documentation<br>

>> > updates, and I am<br>

>> > sharing it to get an opinion and suggestions on the behavior and<br>

>> > configuration parameter names.<br>

>> ><br>

>> > Thanks<br>

>> > Best regards<br>

>> > Muhammad Usama<br>

>> ><br>

>> ><br>

>> > On Mon, Aug 23, 2021 at 6:05 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br>

>> ><br>

>> >> Hi Usama,<br>

>> >><br>

>> >> Sorry for late reply.<br>

>> >><br>

>> >> From: Muhammad Usama <<a href="mailto:m.usama@gmail.com" target="_blank">m.usama@gmail.com</a>><br>

>> >> Subject: Re: [pgpool-hackers: 3898] Re: [pgpool-general: 7543] VIP with<br>

>> >> one node<br>

>> >> Date: Thu, 22 Jul 2021 14:12:59 +0500<br>

>> >> Message-ID: <<br>

>> >> <a href="mailto:CAEJvTzXsKE2B0QMd0AjGBmXK6zocWZZcGU7yzzkSnmff0iAfqA@mail.gmail.com" target="_blank">CAEJvTzXsKE2B0QMd0AjGBmXK6zocWZZcGU7yzzkSnmff0iAfqA@mail.gmail.com</a>><br>

>> >><br>

>> >> > On Tue, Jul 20, 2021 at 4:40 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>><br>

>> wrote:<br>

>> >> ><br>

>> >> >> >> Is it possible to configure watchdog to enable the lost node<br>

>> removal<br>

>> >> >> >> function only when a node is properly shutdown?<br>

>> >> >> >><br>

>> >> >><br>

>> >> >> > Yes if we disable the wd_lost_node_to_remove_timeout (by setting it<br>

>> >> to 0)<br>

>> >> >> > the lost node removal will only happen for properly shutdown nodes.<br>

>> >> >><br>

>> >> >> Oh, I thought setting wd_lost_node_to_remove_timeout to 0 will keep<br>

>> >> >> the existing behavior.<br>

>> >> >><br>

>> >> ><br>

>> >> > As there are two parts of the proposal, First one deals with removing<br>

>> the<br>

>> >> > lost node<br>

>> >> > from the cluster after wd_lost_node_to_remove_timeout amount of time.<br>

>> >> While<br>

>> >> > the<br>

>> >> > second part is about removing the properly shutdown nodes from the<br>

>> >> cluster.<br>

>> >> ><br>

>> >> > Now disabling the wd_lost_node_to_remove_timeout (setting it to 0)<br>

>> will<br>

>> >> > keep the<br>

>> >> > existing behaviour as far as removing the lost node portion of<br>

>> proposal<br>

>> >> is<br>

>> >> > concerned.<br>

>> >> ><br>

>> >> > While not counting the properly shutdown node as part of watchdog<br>

>> cluster<br>

>> >> > is not configurable (as per original proposal), So if we want to make<br>

>> >> this<br>

>> >> > part configurable<br>

>> >> > as well so that we can switch to 100% current behaviour then we can<br>

>> add<br>

>> >> > another<br>

>> >> > config parameter for that. like<br>

>> >> consider_shutdown_nodes_part_of_wd_cluster<br>

>> >> > = [on|off]<br>

>> >><br>

>> >> +1 to add the new parameter.<br>

>> >><br>

>> >> The reason is, some users may want to avoid split brain problem even<br>

>> >> if quorum/VIP is lost.  Suppose there are two admins A for the system<br>

>> >> (OS), B for the database. B never wants to have the split brain<br>

>> >> possibility. If A shutdowns the system, B may not notice there are not<br>

>> >> enough nodes to form consensus anymore because if<br>

>> >> consider_shutdown_nodes_part_of_wd_cluster is on because the<br>

>> >> quorum/VIP will be kept until no node remains.<br>

>> >><br>

>> >> In summary I think there are two use-cases for both<br>

>> >> consider_shutdown_nodes_part_of_wd_cluster is on and off.<br>

>> >> --<br>

>> >> Tatsuo Ishii<br>

>> >> SRA OSS, Inc. Japan<br>

>> >> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> >> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

>> >><br>

>> >><br>

>><br>

</blockquote></div></div>