[pgpool-hackers: 3936] Re: Problem with detach_false_primary/follow_primary_command

Thu Jun 17 22:24:40 JST 2021

Hi Usama,

Thank you for updating the patch. The patch applied cleanly and all
the regression tests including 018.detach_primary passed.

> Hi Ishii-San
> 
> 
> 
> On Wed, Jun 16, 2021 at 3:15 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Hi Usama,
>>
>> Unfortunately the patch did not apply cleanly on current master
>> branch:
>>
> 
> Sorry, Appearently I had'nt created a patch from current master head.
> Attached is the rebaed version
> 
>>
>> $ git apply ~/wd_coordinating_follow_and_detach__primary.patch
>> error: patch failed: src/include/pool.h:426
>> error: src/include/pool.h: patch does not apply
>>
>> So I have not actually tested the patch but it seems the idea of
>> locking watchdog level is more robust than my idea (executing false
>> primary check only on the coordinator node).
>>
> 
> Thanks for the confirmation. I have confirmed the regression is fine with
> the patch
> but I think I need some more testing before I can commit it.
> 
> Best regards
> Muhammad Usama
> 
> 
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > Hi Ishii-San
>> >
>> > As discussed over the slack. I have cooked up a POC patch for
>> implementing
>> > the
>> > follow_primary locking over the watchdog channel.
>> >
>> > The idea is just before executing the follow_primary during the failover
>> > process
>> > we just direct all standby watchdog nodes to acquire the same lock on
>> their
>> > respective
>> > nodes, so that they stop the false primary detection during the period
>> when
>> > the
>> > follow_primary is being executed on the watchdog coordinator node.
>> >
>> > Moreover to keep the watchdog process blocked on waiting for the lock I
>> > have introduced
>> > the pending remote lock mechanism, so that remote locks can be acquired
>> in
>> > the background
>> > after the completion of the inflight replication checks.
>> >
>> > Finally I have removed the REQ_DETAIL_CONFIRMED flag from
>> > degenerate_backend_set()
>> > request that gets issued to detach the false primary, That means all
>> quorum
>> > and consensus rules
>> > will needed to be satisfied for the detach to happen.
>> >
>> > I haven't done a rigorous testing or regression with the patch and
>> > sharing the initial version with you
>> > to get your consensus on the basic idea and design.
>> >
>> > Can you kindly take a look if you agree with the approach.
>> >
>> > Thanks
>> > Best regards
>> > Muhammad Usama
>> >
>> >
>> > On Fri, May 7, 2021 at 9:47 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> >
>> >> I am going to commit/push the patches to master down to 4.0 stable
>> >> (detach_false_primary was introduced in 4.0) branches if there's no
>> >> objection.
>> >>
>> >> Best regards,
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>> >> From: Tatsuo Ishii <ishii at sraoss.co.jp>
>> >> Subject: [pgpool-hackers: 3893] Re: Problem with
>> >> detach_false_primary/follow_primary_command
>> >> Date: Tue, 04 May 2021 13:09:23 +0900 (JST)
>> >> Message-ID: <20210504.130923.644768896074013686.t-ishii at gmail.com>
>> >>
>> >> > In the previous mail I have explained the problem and proposed a patch
>> >> > for the issue.
>> >> >
>> >> > However the original reporter also said the problem will occur in more
>> >> > complex way if watchdog is enabled.
>> >> >
>> >> >
>> https://www.pgpool.net/pipermail/pgpool-general/2021-April/007590.html
>> >> >
>> >> > In summary it seems multiple pgpool nodes perform detach_false_primary
>> >> > concurrently and this is the cause of the problem. I think there's no
>> >> > reason to perform detach_false_primary in multiple pgpool nodes
>> >> > concurrently. Rather we should perform detach_false_primary only on
>> >> > the leader node. If this is correct, we also should not perform
>> >> > detach_false_primary if the quorum is absent because there's no leader
>> >> > if the quorum is absent. Attached is the patch to introduce the check
>> >> > in addition to the v2 patch.
>> >> >
>> >> > I would like to hear opinion from other pgpool developers on that
>> >> > whether we should apply the v3 patch to existing branches. I am asking
>> >> > because currently we perform detach_false_primary even if the quorum
>> >> > is absent and the change may be "change of user visible behavior"
>> >> > which we usually avoid on stable branches. However the current
>> >> > detach_false_primary apparently does not work on the environment where
>> >> > watchdog is enabled, I think patching to back branches are
>> exceptionally
>> >> > reasonable choice.
>> >> >
>> >> > Also I have added the regression test patch.
>> >> >
>> >> >> In the posting:
>> >> >>
>> >> >> [pgpool-general: 7525] Strange behavior on switchover with
>> >> detach_false_primary enabled
>> >> >>
>> >> >> it is reported that detach_false_primary and follow_primary_command
>> >> >> could conflict each other and pgpool goes into unwanted state. We can
>> >> >> reproduce the issue by using pgpool_setup to create 3 node
>> >> >> configuration.
>> >> >>
>> >> >> $ pgpool_setup -n 3
>> >> >>
>> >> >> echo "detach_false_primary" >> etc/pgpool.conf
>> >> >> echo "sr_check_period = 1" >> etc/pgpool.conf
>> >> >>
>> >> >> The latter may not be mandatory but making the streaming replication
>> >> >> check frequently will reliably reproduce the problem because
>> >> >> detach_false_primary is executed in the streaming replication check
>> >> >> process.
>> >> >>
>> >> >> The initial state is as follows:
>> >> >>
>> >> >> psql -p 11000 -c "show pool_nodes" test
>> >> >>  node_id | hostname | port  | status | pg_status | lb_weight |  role
>> >>  | pg_role | select_cnt | load_balance_node | replication_delay |
>> >> replication_state | replication_sync_state | last_status_change
>> >> >>
>> >>
>> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>> >> >>  0       | /tmp     | 11002 | up     | up        | 0.333333  |
>> primary
>> >> | primary | 0          | true              | 0                 |
>> >>        |                        | 2021-05-04 11:12:01
>> >> >>  1       | /tmp     | 11003 | up     | up        | 0.333333  |
>> standby
>> >> | standby | 0          | false             | 0                 |
>> streaming
>> >>        | async                  | 2021-05-04 11:12:01
>> >> >>  2       | /tmp     | 11004 | up     | up        | 0.333333  |
>> standby
>> >> | standby | 0          | false             | 0                 |
>> streaming
>> >>        | async                  | 2021-05-04 11:12:01
>> >> >> (3 rows)
>> >> >>
>> >> >> Execute pcp_detatch_node against node 0.
>> >> >>
>> >> >> $ pcp_detach_node -w -p 11001 0
>> >> >>
>> >> >> This will let the primary be in down status and this will promote
>> node
>> >> 1.
>> >> >>
>> >> >> 2021-05-04 12:12:14: pcp_child pid 31449: LOG:  received degenerate
>> >> backend request for node_id: 0 from pid [31449]
>> >> >> 2021-05-04 12:12:14: main pid 31221: LOG:  Pgpool-II parent process
>> has
>> >> received failover request
>> >> >> 2021-05-04 12:12:14: main pid 31221: LOG:  starting degeneration.
>> >> shutdown host /tmp(11002)
>> >> >> 2021-05-04 12:12:14: pcp_main pid 31260: LOG:  PCP process with pid:
>> >> 31449 exit with SUCCESS.
>> >> >> 2021-05-04 12:12:14: pcp_main pid 31260: LOG:  PCP process with pid:
>> >> 31449 exits with status 0
>> >> >> 2021-05-04 12:12:14: main pid 31221: LOG:  Restart all children
>> >> >> 2021-05-04 12:12:14: main pid 31221: LOG:  execute command:
>> >> /home/t-ishii/work/Pgpool-II/current/x/etc/failover.sh 0 /tmp 11002
>> >> /home/t-ishii/work/Pgpool-II/current/x/data0 1 0 /tmp 0 11003
>> >> /home/t-ishii/work/Pgpool-II/current/x/data1
>> >> >>
>> >> >> However detach_false_primary found that the just promoted node 1 is
>> >> >> not good because it does not have any follower standby node because
>> >> >> follow_primary_command did not completed yet.
>> >> >>
>> >> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:
>> >> verify_backend_node_status: primary 1 does not connect to standby 2
>> >> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:
>> >> verify_backend_node_status: primary 1 owns only 0 standbys out of 1
>> >> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:
>> >> pgpool_worker_child: invalid node found 1
>> >> >>
>> >> >> And detach_false_primary sent failover request for node 1.
>> >> >>
>> >> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:  received
>> >> degenerate backend request for node_id: 1 from pid [31261]
>> >> >>
>> >> >> Moreover every 1 second detach_false_primary tries to detach node 1.
>> >> >>
>> >> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:
>> >> verify_backend_node_status: primary 1 does not connect to standby 2
>> >> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:
>> >> verify_backend_node_status: primary 1 owns only 0 standbys out of 1
>> >> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:
>> >> pgpool_worker_child: invalid node found 1
>> >> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:  received
>> >> degenerate backend request for node_id: 1 from pid [31261]
>> >> >>
>> >> >> The confuses the whole follow_primary_command and in the end we have:
>> >> >>
>> >> >> psql -p 11000 -c "show pool_nodes" test
>> >> >>  node_id | hostname | port  | status | pg_status | lb_weight |  role
>> >>  | pg_role | select_cnt | load_balance_node | replication_delay |
>> >> replication_state | replication_sync_state | last_status_change
>> >> >>
>> >>
>> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>> >> >>  0       | /tmp     | 11002 | down   | down      | 0.333333  |
>> standby
>> >> | unknown | 0          | false             | 0                 |
>> >>        |                        | 2021-05-04 12:12:16
>> >> >>  1       | /tmp     | 11003 | up     | up        | 0.333333  |
>> standby
>> >> | standby | 0          | false             | 0                 |
>> >>        |                        | 2021-05-04 12:22:28
>> >> >>  2       | /tmp     | 11004 | up     | up        | 0.333333  |
>> standby
>> >> | standby | 0          | true              | 0                 |
>> >>        |                        | 2021-05-04 12:22:28
>> >> >> (3 rows)
>> >> >>
>> >> >> Of course this is totally unwanted result.
>> >> >>
>> >> >> I think the root cause of the problem is, detach_false_primary and
>> >> >> follow_primary_command are allowed to run concurrently. To solve the
>> >> >> problem we need to have a lock so that if detach_false_primary
>> already
>> >> >> runs, follow_primary_command should wait for it's completion or vice
>> >> >> versa.
>> >> >>
>> >> >> For this purpose I propose attached patch
>> >> >> detach_false_primary_v2.diff. In the patch new function
>> >> >> pool_acquire_follow_primary_lock(bool block) and
>> >> >> pool_release_follow_primary_lock(void) are introduced. They are
>> >> >> responsible for acquiring or releasing the lock. There are 3 places
>> >> >> where those functions are used:
>> >> >>
>> >> >> 1) find_primary_node
>> >> >>
>> >> >> This function is called upon startup and failover in the main pgpool
>> >> >> process to find new primary node.
>> >> >>
>> >> >> 2) failover
>> >> >>
>> >> >> This function is called in the follow_primary_command subprocess
>> >> >> forked off by pgpool main process to execute follow_primary_command
>> >> >> script. The lock should be help until all follow_primary_command are
>> >> >> completed.
>> >> >>
>> >> >> 3) streaming replication check
>> >> >>
>> >> >> Before starting verify_backend_node, which is the work horse of
>> >> >> detach_false_primary, the lock must be acquired. If it fails, just
>> >> >> skip the streaming replication check cycle.
>> >> >>
>> >> >>
>> >> >> I and the user who made the initial report confirmed that tha patch
>> >> >> works well.
>> >> >>
>> >> >> Unfortunately the story is not the all. However the mail is already
>> >> >> too long. I will continue to the next mail.
>> >> >>
>> >> >> Best regards,
>> >> >> --
>> >> >> Tatsuo Ishii
>> >> >> SRA OSS, Inc. Japan
>> >> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >> Japanese:http://www.sraoss.co.jp
>> >> _______________________________________________
>> >> pgpool-hackers mailing list
>> >> pgpool-hackers at pgpool.net
>> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>> >>
>>