<div dir="auto">Sure, thanks for the update <br><div class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr"><span style="font-family:sans-serif">Muhammad. I currently don't have a setup, will tryout the fix once I have it.</span></div><div dir="ltr" class="gmail_attr"><span style="font-family:sans-serif"><br></span></div><div dir="ltr" class="gmail_attr"><span style="font-family:sans-serif">Thanks & Regards,</span></div><div dir="ltr" class="gmail_attr"><span style="font-family:sans-serif"><br></span></div><div dir="ltr" class="gmail_attr"><span style="font-family:sans-serif">     Lakshmi Y M</span></div><div dir="ltr" class="gmail_attr"><span style="font-family:sans-serif"><br></span></div><div dir="ltr" class="gmail_attr">On Wed, Oct 2, 2019, 7:48 PM Muhammad Usama <<a href="mailto:m.usama@gmail.com">m.usama@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Lakshmi,<br><br>Sorry for the delayed response and many thanks for providing the log files.<br><br>I have been looking into a few similar bug reports and after reviewing the log you sent and the ones<br>shared on  <a href="https://www.pgpool.net/mantisbt/view.php?id=547" target="_blank" rel="noreferrer">https://www.pgpool.net/mantisbt/view.php?id=547</a> I realized that there was confusion<br>in the watchdog code on how to deal with the life-check failed scenarios especially for the cases when the<br>life-check reports the node failure while watchdog core still able to communicate with remote nodes.<br>and also for the case when node A's life-check reports node B as lost while B still thinks A is alive and healthy.<br><br>So I have reviewed the whole watchdog design around the life-check reports and have made some fixes.<div>I am not sure if you have a development setup and can verify the fix but I am attaching the patch anyway if you</div><div>want to try that out. The patch is generated against the current MASTER branch and I will commit it after little</div><div>more testing and then backport it to all supported branches, and hopefully, your issue will be fixed in the upcoming</div><div>release of Pgpool-II.</div><div><br>Thanks</div><div>Best regards</div><div>Muhammad Usama</div><div><br></div></div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 2, 2019 at 9:31 AM Lakshmi Raghavendra <<a href="mailto:lakshmiym108@gmail.com" target="_blank" rel="noreferrer">lakshmiym108@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Tatsuo,<div><br></div><div>          Please find attached the zip file.</div><div><br></div><div>Thanks And Regards,</div><div><br></div><div>  Lakshmi Y M</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 2, 2019 at 5:13 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank" rel="noreferrer">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Lakshmi,<br>

<br>

Your attached files are too large to accept by the mailing list. Can<br>

you compress them and post the message along the compressed attached<br>

files?<br>

<br>

Best regards,<br>

--<br>

Tatsuo Ishii<br>

SRA OSS, Inc. Japan<br>

English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

<br>

From: Lakshmi Raghavendra <<a href="mailto:lakshmiym108@gmail.com" target="_blank" rel="noreferrer">lakshmiym108@gmail.com</a>><br>

Subject: Fwd: [pgpool-general: 6672] Query<br>

Date: Sun, 1 Sep 2019 23:14:30 +0530<br>

Message-ID: <<a href="mailto:CAHHVJ5sRoVFEEW4EoZLgudCTTm0cqGjXhbbkpnOiimcs4euUSw@mail.gmail.com" target="_blank" rel="noreferrer">CAHHVJ5sRoVFEEW4EoZLgudCTTm0cqGjXhbbkpnOiimcs4euUSw@mail.gmail.com</a>><br>

<br>

> ---------- Forwarded message ---------<br>

> From: Lakshmi Raghavendra <<a href="mailto:lakshmiym108@gmail.com" target="_blank" rel="noreferrer">lakshmiym108@gmail.com</a>><br>

> Date: Sat, Aug 31, 2019 at 10:17 PM<br>

> Subject: Re: [pgpool-general: 6672] Query<br>

> To: Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank" rel="noreferrer">ishii@sraoss.co.jp</a>><br>

> Cc: Muhammad Usama <<a href="mailto:m.usama@gmail.com" target="_blank" rel="noreferrer">m.usama@gmail.com</a>>, <<a href="mailto:pgpool-general@pgpool.net" target="_blank" rel="noreferrer">pgpool-general@pgpool.net</a>><br>

> <br>

> <br>

> Hi Usama / Tatsuo,<br>

> <br>

>          Received the email notification today, sorry for the delayed<br>

> response.<br>

> Please find attached the pgpool-II log for the same.<br>

> <br>

> So basically below is the short summary of the issue:<br>

> <br>

> <br>

> Node -1 : Pgpool Master + Postgres Master<br>

> <br>

> Node -2 : Pgpool Standby + Postgres Standby<br>

> <br>

> Node-3 : Pgpool Standby + Postgres Standby<br>

> <br>

> <br>

> When network failure happens and Node-1 goes out of network, below is the<br>

> status :<br>

> <br>

> Node-1 : Pgpool Lost status + Postgres Standby (down)<br>

> <br>

> Node -2 : Pgpool Master + Postgres Master<br>

> <br>

> Node-3 : Pgpool Standby + Postgres Standby<br>

> <br>

> <br>

> Now when Node-1 comes back to network , below is the status causing the<br>

> pgpool cluster to get into imbalance :<br>

> <br>

> <br>

> <br>

> lcm-34-189:~ # psql -h 10.198.34.191 -p 9999 -U pgpool postgres -c "show<br>

> pool_nodes"<br>

> Password for user pgpool:<br>

>  node_id |   hostname    | port | status | lb_weight |  role   | select_cnt<br>

> | load_balance_node | replication_delay | last_status_change<br>

> ---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------<br>

>  0       | 10.198.34.188 | 5432 | up     | 0.333333  | primary | 0<br>

>  | true              | 0                 | 2019-08-31 16:40:26<br>

>  1       | 10.198.34.189 | 5432 | up     | 0.333333  | standby | 0<br>

>  | false             | 1013552           | 2019-08-31 16:40:26<br>

>  2       | 10.198.34.190 | 5432 | up     | 0.333333  | standby | 0<br>

>  | false             | 0                 | 2019-08-31 16:40:26<br>

> (3 rows)<br>

> <br>

> lcm-34-189:~ # /usr/local/bin/pcp_watchdog_info -p 9898 -h 10.198.34.191 -U<br>

> pgpool<br>

> Password:<br>

> 3 NO lcm-34-188.dev.lcm.local:9999 Linux lcm-34-188.dev.lcm.local<br>

> 10.198.34.188<br>

> <br>

> lcm-34-189.dev.lcm.local:9999 Linux lcm-34-189.dev.lcm.local<br>

> lcm-34-189.dev.lcm.local 9999 9000 7 STANDBY<br>

> lcm-34-188.dev.lcm.local:9999 Linux lcm-34-188.dev.lcm.local 10.198.34.188<br>

> 9999 9000 4 MASTER<br>

> lcm-34-190.dev.lcm.local:9999 Linux lcm-34-190.dev.lcm.local 10.198.34.190<br>

> 9999 9000 4 MASTER<br>

> lcm-34-189:~ #<br>

> <br>

> <br>

> <br>

> Thanks And Regards,<br>

> <br>

>    Lakshmi Y M<br>

> <br>

> On Tue, Aug 20, 2019 at 8:55 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank" rel="noreferrer">ishii@sraoss.co.jp</a>> wrote:<br>

> <br>

>> > On Sat, Aug 17, 2019 at 12:28 PM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank" rel="noreferrer">ishii@sraoss.co.jp</a>><br>

>> wrote:<br>

>> ><br>

>> >> > Hi Pgpool Team,<br>

>> >> ><br>

>> >> >               *We are nearing a production release and running into<br>

>> the<br>

>> >> > below issues.*<br>

>> >> > Replies at the earliest would be highly helpful and greatly<br>

>> appreciated.<br>

>> >> > Please let us know on how to get rid of the below issues.<br>

>> >> ><br>

>> >> > We have a 3 node pgpool + postgres cluster - M1 , M2, M3. The<br>

>> pgpool.conf<br>

>> >> > is as attached.<br>

>> >> ><br>

>> >> > *Case I :  *<br>

>> >> > M1 - Pgpool Master + Postgres Master<br>

>> >> > M2 , M3 - Pgpool slave + Postgres slave<br>

>> >> ><br>

>> >> > - M1 goes out of network. its marked as LOST in the pgpool cluster<br>

>> >> > - M2 becomes postgres master<br>

>> >> > - M3 becomes pgpool master.<br>

>> >> > - When M1 comes back to the network, pgpool is able to solve split<br>

>> brain.<br>

>> >> > However, its changing the postgres master back to M1 by logging a<br>

>> >> statement<br>

>> >> > - "LOG:  primary node was chenged after the sync from new master", so<br>

>> >> since<br>

>> >> > M2 was already postgres master (and its trigger file is not touched)<br>

>> its<br>

>> >> > not able to sync to the new master.<br>

>> >> > *I somehow want to avoid this postgres master change..please let us<br>

>> know<br>

>> >> if<br>

>> >> > there is a way to avoid it*<br>

>> >><br>

>> >> Sorry but I don't know how to prevent this. Probably when former<br>

>> >> watchdog master recovers from an network outage and there's already<br>

>> >> PostgreSQL primary server, the watchdog master should not sync the<br>

>> >> state. What do you think Usama?<br>

>> >><br>

>> ><br>

>> > Yes, that's true, there is no functionality that exists in Pgpool-II to<br>

>> > disable the backend node status synch. In fact that<br>

>> > would be hazardous if we somehow disable the node status syncing.<br>

>> ><br>

>> > But having said that, In the mentioned scenario when the M1 comes back<br>

>> and<br>

>> > join the watchdog cluster Pgpool-II should have<br>

>> > kept the M2 as the true master while resolving the split-brain. The<br>

>> > algorithm used to resolve the true master considers quite a<br>

>> > few parameters and for the scenario, you explained, M2 should have kept<br>

>> the<br>

>> > master node status while M1 should have resigned<br>

>> > after joining back the cluster and effectively the M1 node should have<br>

>> been<br>

>> > syncing the status from M2 ( keeping the proper primary node)<br>

>> > not the other way around.<br>

>> > Can you please share the Pgpool-II log files so that I can have a look at<br>

>> > what went wrong in this case.<br>

>><br>

>> Usama,<br>

>><br>

>> Ok, the scenario (PostgreSQL primary x 2 in the end) should have not<br>

>> happend. That's a good news.<br>

>><br>

>> Lakshmi,<br>

>><br>

>> Can you please provide the Pgpool-II log files as Usama requested?<br>

>><br>

>> Best regards,<br>

>> --<br>

>> Tatsuo Ishii<br>

>> SRA OSS, Inc. Japan<br>

>> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

>><br>

</blockquote></div>

_______________________________________________<br>

pgpool-general mailing list<br>

<a href="mailto:pgpool-general@pgpool.net" target="_blank" rel="noreferrer">pgpool-general@pgpool.net</a><br>

<a href="http://www.pgpool.net/mailman/listinfo/pgpool-general" rel="noreferrer noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-general<br></a><br>

</blockquote></div>

</blockquote></div></div>