<div dir="ltr">Hi Ishii-San,<div><br></div><div>Thanks for confirmation. I will commit the patch after doing some more testing.</div><div><br></div><div>Best regards</div><div>Muhammad Usama</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 17, 2021 at 6:24 PM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Usama,<br>

<br>

Thank you for updating the patch. The patch applied cleanly and all<br>

the regression tests including 018.detach_primary passed.<br>

<br>

> Hi Ishii-San<br>

> <br>

> <br>

> <br>

> On Wed, Jun 16, 2021 at 3:15 PM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br>

> <br>

>> Hi Usama,<br>

>><br>

>> Unfortunately the patch did not apply cleanly on current master<br>

>> branch:<br>

>><br>

> <br>

> Sorry, Appearently I had'nt created a patch from current master head.<br>

> Attached is the rebaed version<br>

> <br>

>><br>

>> $ git apply ~/wd_coordinating_follow_and_detach__primary.patch<br>

>> error: patch failed: src/include/pool.h:426<br>

>> error: src/include/pool.h: patch does not apply<br>

>><br>

>> So I have not actually tested the patch but it seems the idea of<br>

>> locking watchdog level is more robust than my idea (executing false<br>

>> primary check only on the coordinator node).<br>

>><br>

> <br>

> Thanks for the confirmation. I have confirmed the regression is fine with<br>

> the patch<br>

> but I think I need some more testing before I can commit it.<br>

> <br>

> Best regards<br>

> Muhammad Usama<br>

> <br>

> <br>

>> Best regards,<br>

>> --<br>

>> Tatsuo Ishii<br>

>> SRA OSS, Inc. Japan<br>

>> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

>><br>

>> > Hi Ishii-San<br>

>> ><br>

>> > As discussed over the slack. I have cooked up a POC patch for<br>

>> implementing<br>

>> > the<br>

>> > follow_primary locking over the watchdog channel.<br>

>> ><br>

>> > The idea is just before executing the follow_primary during the failover<br>

>> > process<br>

>> > we just direct all standby watchdog nodes to acquire the same lock on<br>

>> their<br>

>> > respective<br>

>> > nodes, so that they stop the false primary detection during the period<br>

>> when<br>

>> > the<br>

>> > follow_primary is being executed on the watchdog coordinator node.<br>

>> ><br>

>> > Moreover to keep the watchdog process blocked on waiting for the lock I<br>

>> > have introduced<br>

>> > the pending remote lock mechanism, so that remote locks can be acquired<br>

>> in<br>

>> > the background<br>

>> > after the completion of the inflight replication checks.<br>

>> ><br>

>> > Finally I have removed the REQ_DETAIL_CONFIRMED flag from<br>

>> > degenerate_backend_set()<br>

>> > request that gets issued to detach the false primary, That means all<br>

>> quorum<br>

>> > and consensus rules<br>

>> > will needed to be satisfied for the detach to happen.<br>

>> ><br>

>> > I haven't done a rigorous testing or regression with the patch and<br>

>> > sharing the initial version with you<br>

>> > to get your consensus on the basic idea and design.<br>

>> ><br>

>> > Can you kindly take a look if you agree with the approach.<br>

>> ><br>

>> > Thanks<br>

>> > Best regards<br>

>> > Muhammad Usama<br>

>> ><br>

>> ><br>

>> > On Fri, May 7, 2021 at 9:47 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br>

>> ><br>

>> >> I am going to commit/push the patches to master down to 4.0 stable<br>

>> >> (detach_false_primary was introduced in 4.0) branches if there's no<br>

>> >> objection.<br>

>> >><br>

>> >> Best regards,<br>

>> >> --<br>

>> >> Tatsuo Ishii<br>

>> >> SRA OSS, Inc. Japan<br>

>> >> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> >> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

>> >><br>

>> >> From: Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>><br>

>> >> Subject: [pgpool-hackers: 3893] Re: Problem with<br>

>> >> detach_false_primary/follow_primary_command<br>

>> >> Date: Tue, 04 May 2021 13:09:23 +0900 (JST)<br>

>> >> Message-ID: <<a href="mailto:20210504.130923.644768896074013686.t-ishii@gmail.com" target="_blank">20210504.130923.644768896074013686.t-ishii@gmail.com</a>><br>

>> >><br>

>> >> > In the previous mail I have explained the problem and proposed a patch<br>

>> >> > for the issue.<br>

>> >> ><br>

>> >> > However the original reporter also said the problem will occur in more<br>

>> >> > complex way if watchdog is enabled.<br>

>> >> ><br>

>> >> ><br>

>> <a href="https://www.pgpool.net/pipermail/pgpool-general/2021-April/007590.html" rel="noreferrer" target="_blank">https://www.pgpool.net/pipermail/pgpool-general/2021-April/007590.html</a><br>

>> >> ><br>

>> >> > In summary it seems multiple pgpool nodes perform detach_false_primary<br>

>> >> > concurrently and this is the cause of the problem. I think there's no<br>

>> >> > reason to perform detach_false_primary in multiple pgpool nodes<br>

>> >> > concurrently. Rather we should perform detach_false_primary only on<br>

>> >> > the leader node. If this is correct, we also should not perform<br>

>> >> > detach_false_primary if the quorum is absent because there's no leader<br>

>> >> > if the quorum is absent. Attached is the patch to introduce the check<br>

>> >> > in addition to the v2 patch.<br>

>> >> ><br>

>> >> > I would like to hear opinion from other pgpool developers on that<br>

>> >> > whether we should apply the v3 patch to existing branches. I am asking<br>

>> >> > because currently we perform detach_false_primary even if the quorum<br>

>> >> > is absent and the change may be "change of user visible behavior"<br>

>> >> > which we usually avoid on stable branches. However the current<br>

>> >> > detach_false_primary apparently does not work on the environment where<br>

>> >> > watchdog is enabled, I think patching to back branches are<br>

>> exceptionally<br>

>> >> > reasonable choice.<br>

>> >> ><br>

>> >> > Also I have added the regression test patch.<br>

>> >> ><br>

>> >> >> In the posting:<br>

>> >> >><br>

>> >> >> [pgpool-general: 7525] Strange behavior on switchover with<br>

>> >> detach_false_primary enabled<br>

>> >> >><br>

>> >> >> it is reported that detach_false_primary and follow_primary_command<br>

>> >> >> could conflict each other and pgpool goes into unwanted state. We can<br>

>> >> >> reproduce the issue by using pgpool_setup to create 3 node<br>

>> >> >> configuration.<br>

>> >> >><br>

>> >> >> $ pgpool_setup -n 3<br>

>> >> >><br>

>> >> >> echo "detach_false_primary" >> etc/pgpool.conf<br>

>> >> >> echo "sr_check_period = 1" >> etc/pgpool.conf<br>

>> >> >><br>

>> >> >> The latter may not be mandatory but making the streaming replication<br>

>> >> >> check frequently will reliably reproduce the problem because<br>

>> >> >> detach_false_primary is executed in the streaming replication check<br>

>> >> >> process.<br>

>> >> >><br>

>> >> >> The initial state is as follows:<br>

>> >> >><br>

>> >> >> psql -p 11000 -c "show pool_nodes" test<br>

>> >> >>  node_id | hostname | port  | status | pg_status | lb_weight |  role<br>

>> >>  | pg_role | select_cnt | load_balance_node | replication_delay |<br>

>> >> replication_state | replication_sync_state | last_status_change<br>

>> >> >><br>

>> >><br>

>> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------<br>

>> >> >>  0       | /tmp     | 11002 | up     | up        | 0.333333  |<br>

>> primary<br>

>> >> | primary | 0          | true              | 0                 |<br>

>> >>        |                        | 2021-05-04 11:12:01<br>

>> >> >>  1       | /tmp     | 11003 | up     | up        | 0.333333  |<br>

>> standby<br>

>> >> | standby | 0          | false             | 0                 |<br>

>> streaming<br>

>> >>        | async                  | 2021-05-04 11:12:01<br>

>> >> >>  2       | /tmp     | 11004 | up     | up        | 0.333333  |<br>

>> standby<br>

>> >> | standby | 0          | false             | 0                 |<br>

>> streaming<br>

>> >>        | async                  | 2021-05-04 11:12:01<br>

>> >> >> (3 rows)<br>

>> >> >><br>

>> >> >> Execute pcp_detatch_node against node 0.<br>

>> >> >><br>

>> >> >> $ pcp_detach_node -w -p 11001 0<br>

>> >> >><br>

>> >> >> This will let the primary be in down status and this will promote<br>

>> node<br>

>> >> 1.<br>

>> >> >><br>

>> >> >> 2021-05-04 12:12:14: pcp_child pid 31449: LOG:  received degenerate<br>

>> >> backend request for node_id: 0 from pid [31449]<br>

>> >> >> 2021-05-04 12:12:14: main pid 31221: LOG:  Pgpool-II parent process<br>

>> has<br>

>> >> received failover request<br>

>> >> >> 2021-05-04 12:12:14: main pid 31221: LOG:  starting degeneration.<br>

>> >> shutdown host /tmp(11002)<br>

>> >> >> 2021-05-04 12:12:14: pcp_main pid 31260: LOG:  PCP process with pid:<br>

>> >> 31449 exit with SUCCESS.<br>

>> >> >> 2021-05-04 12:12:14: pcp_main pid 31260: LOG:  PCP process with pid:<br>

>> >> 31449 exits with status 0<br>

>> >> >> 2021-05-04 12:12:14: main pid 31221: LOG:  Restart all children<br>

>> >> >> 2021-05-04 12:12:14: main pid 31221: LOG:  execute command:<br>

>> >> /home/t-ishii/work/Pgpool-II/current/x/etc/failover.sh 0 /tmp 11002<br>

>> >> /home/t-ishii/work/Pgpool-II/current/x/data0 1 0 /tmp 0 11003<br>

>> >> /home/t-ishii/work/Pgpool-II/current/x/data1<br>

>> >> >><br>

>> >> >> However detach_false_primary found that the just promoted node 1 is<br>

>> >> >> not good because it does not have any follower standby node because<br>

>> >> >> follow_primary_command did not completed yet.<br>

>> >> >><br>

>> >> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:<br>

>> >> verify_backend_node_status: primary 1 does not connect to standby 2<br>

>> >> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:<br>

>> >> verify_backend_node_status: primary 1 owns only 0 standbys out of 1<br>

>> >> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:<br>

>> >> pgpool_worker_child: invalid node found 1<br>

>> >> >><br>

>> >> >> And detach_false_primary sent failover request for node 1.<br>

>> >> >><br>

>> >> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:  received<br>

>> >> degenerate backend request for node_id: 1 from pid [31261]<br>

>> >> >><br>

>> >> >> Moreover every 1 second detach_false_primary tries to detach node 1.<br>

>> >> >><br>

>> >> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:<br>

>> >> verify_backend_node_status: primary 1 does not connect to standby 2<br>

>> >> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:<br>

>> >> verify_backend_node_status: primary 1 owns only 0 standbys out of 1<br>

>> >> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:<br>

>> >> pgpool_worker_child: invalid node found 1<br>

>> >> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:  received<br>

>> >> degenerate backend request for node_id: 1 from pid [31261]<br>

>> >> >><br>

>> >> >> The confuses the whole follow_primary_command and in the end we have:<br>

>> >> >><br>

>> >> >> psql -p 11000 -c "show pool_nodes" test<br>

>> >> >>  node_id | hostname | port  | status | pg_status | lb_weight |  role<br>

>> >>  | pg_role | select_cnt | load_balance_node | replication_delay |<br>

>> >> replication_state | replication_sync_state | last_status_change<br>

>> >> >><br>

>> >><br>

>> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------<br>

>> >> >>  0       | /tmp     | 11002 | down   | down      | 0.333333  |<br>

>> standby<br>

>> >> | unknown | 0          | false             | 0                 |<br>

>> >>        |                        | 2021-05-04 12:12:16<br>

>> >> >>  1       | /tmp     | 11003 | up     | up        | 0.333333  |<br>

>> standby<br>

>> >> | standby | 0          | false             | 0                 |<br>

>> >>        |                        | 2021-05-04 12:22:28<br>

>> >> >>  2       | /tmp     | 11004 | up     | up        | 0.333333  |<br>

>> standby<br>

>> >> | standby | 0          | true              | 0                 |<br>

>> >>        |                        | 2021-05-04 12:22:28<br>

>> >> >> (3 rows)<br>

>> >> >><br>

>> >> >> Of course this is totally unwanted result.<br>

>> >> >><br>

>> >> >> I think the root cause of the problem is, detach_false_primary and<br>

>> >> >> follow_primary_command are allowed to run concurrently. To solve the<br>

>> >> >> problem we need to have a lock so that if detach_false_primary<br>

>> already<br>

>> >> >> runs, follow_primary_command should wait for it's completion or vice<br>

>> >> >> versa.<br>

>> >> >><br>

>> >> >> For this purpose I propose attached patch<br>

>> >> >> detach_false_primary_v2.diff. In the patch new function<br>

>> >> >> pool_acquire_follow_primary_lock(bool block) and<br>

>> >> >> pool_release_follow_primary_lock(void) are introduced. They are<br>

>> >> >> responsible for acquiring or releasing the lock. There are 3 places<br>

>> >> >> where those functions are used:<br>

>> >> >><br>

>> >> >> 1) find_primary_node<br>

>> >> >><br>

>> >> >> This function is called upon startup and failover in the main pgpool<br>

>> >> >> process to find new primary node.<br>

>> >> >><br>

>> >> >> 2) failover<br>

>> >> >><br>

>> >> >> This function is called in the follow_primary_command subprocess<br>

>> >> >> forked off by pgpool main process to execute follow_primary_command<br>

>> >> >> script. The lock should be help until all follow_primary_command are<br>

>> >> >> completed.<br>

>> >> >><br>

>> >> >> 3) streaming replication check<br>

>> >> >><br>

>> >> >> Before starting verify_backend_node, which is the work horse of<br>

>> >> >> detach_false_primary, the lock must be acquired. If it fails, just<br>

>> >> >> skip the streaming replication check cycle.<br>

>> >> >><br>

>> >> >><br>

>> >> >> I and the user who made the initial report confirmed that tha patch<br>

>> >> >> works well.<br>

>> >> >><br>

>> >> >> Unfortunately the story is not the all. However the mail is already<br>

>> >> >> too long. I will continue to the next mail.<br>

>> >> >><br>

>> >> >> Best regards,<br>

>> >> >> --<br>

>> >> >> Tatsuo Ishii<br>

>> >> >> SRA OSS, Inc. Japan<br>

>> >> >> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>

>> >> >> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

>> >> _______________________________________________<br>

>> >> pgpool-hackers mailing list<br>

>> >> <a href="mailto:pgpool-hackers@pgpool.net" target="_blank">pgpool-hackers@pgpool.net</a><br>

>> >> <a href="http://www.pgpool.net/mailman/listinfo/pgpool-hackers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-hackers</a><br>

>> >><br>

>><br>

</blockquote></div>