[pgpool-hackers: 3932] Re: Problem with detach_false_primary/follow_primary_command

Wed Jun 16 18:00:55 JST 2021

Hi Ishii-San

As discussed over the slack. I have cooked up a POC patch for implementing
the
follow_primary locking over the watchdog channel.

The idea is just before executing the follow_primary during the failover
process
we just direct all standby watchdog nodes to acquire the same lock on their
respective
nodes, so that they stop the false primary detection during the period when
the
follow_primary is being executed on the watchdog coordinator node.

Moreover to keep the watchdog process blocked on waiting for the lock I
have introduced
the pending remote lock mechanism, so that remote locks can be acquired in
the background
after the completion of the inflight replication checks.

Finally I have removed the REQ_DETAIL_CONFIRMED flag from
degenerate_backend_set()
request that gets issued to detach the false primary, That means all quorum
and consensus rules
will needed to be satisfied for the detach to happen.

I haven't done a rigorous testing or regression with the patch and
sharing the initial version with you
to get your consensus on the basic idea and design.

Can you kindly take a look if you agree with the approach.

Thanks
Best regards
Muhammad Usama

On Fri, May 7, 2021 at 9:47 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> I am going to commit/push the patches to master down to 4.0 stable
> (detach_false_primary was introduced in 4.0) branches if there's no
> objection.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> Subject: [pgpool-hackers: 3893] Re: Problem with
> detach_false_primary/follow_primary_command
> Date: Tue, 04 May 2021 13:09:23 +0900 (JST)
> Message-ID: <20210504.130923.644768896074013686.t-ishii at gmail.com>
>
> > In the previous mail I have explained the problem and proposed a patch
> > for the issue.
> >
> > However the original reporter also said the problem will occur in more
> > complex way if watchdog is enabled.
> >
> > https://www.pgpool.net/pipermail/pgpool-general/2021-April/007590.html
> >
> > In summary it seems multiple pgpool nodes perform detach_false_primary
> > concurrently and this is the cause of the problem. I think there's no
> > reason to perform detach_false_primary in multiple pgpool nodes
> > concurrently. Rather we should perform detach_false_primary only on
> > the leader node. If this is correct, we also should not perform
> > detach_false_primary if the quorum is absent because there's no leader
> > if the quorum is absent. Attached is the patch to introduce the check
> > in addition to the v2 patch.
> >
> > I would like to hear opinion from other pgpool developers on that
> > whether we should apply the v3 patch to existing branches. I am asking
> > because currently we perform detach_false_primary even if the quorum
> > is absent and the change may be "change of user visible behavior"
> > which we usually avoid on stable branches. However the current
> > detach_false_primary apparently does not work on the environment where
> > watchdog is enabled, I think patching to back branches are exceptionally
> > reasonable choice.
> >
> > Also I have added the regression test patch.
> >
> >> In the posting:
> >>
> >> [pgpool-general: 7525] Strange behavior on switchover with
> detach_false_primary enabled
> >>
> >> it is reported that detach_false_primary and follow_primary_command
> >> could conflict each other and pgpool goes into unwanted state. We can
> >> reproduce the issue by using pgpool_setup to create 3 node
> >> configuration.
> >>
> >> $ pgpool_setup -n 3
> >>
> >> echo "detach_false_primary" >> etc/pgpool.conf
> >> echo "sr_check_period = 1" >> etc/pgpool.conf
> >>
> >> The latter may not be mandatory but making the streaming replication
> >> check frequently will reliably reproduce the problem because
> >> detach_false_primary is executed in the streaming replication check
> >> process.
> >>
> >> The initial state is as follows:
> >>
> >> psql -p 11000 -c "show pool_nodes" test
> >>  node_id | hostname | port  | status | pg_status | lb_weight |  role
>  | pg_role | select_cnt | load_balance_node | replication_delay |
> replication_state | replication_sync_state | last_status_change
> >>
> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >>  0       | /tmp     | 11002 | up     | up        | 0.333333  | primary
> | primary | 0          | true              | 0                 |
>        |                        | 2021-05-04 11:12:01
> >>  1       | /tmp     | 11003 | up     | up        | 0.333333  | standby
> | standby | 0          | false             | 0                 | streaming
>        | async                  | 2021-05-04 11:12:01
> >>  2       | /tmp     | 11004 | up     | up        | 0.333333  | standby
> | standby | 0          | false             | 0                 | streaming
>        | async                  | 2021-05-04 11:12:01
> >> (3 rows)
> >>
> >> Execute pcp_detatch_node against node 0.
> >>
> >> $ pcp_detach_node -w -p 11001 0
> >>
> >> This will let the primary be in down status and this will promote node
> 1.
> >>
> >> 2021-05-04 12:12:14: pcp_child pid 31449: LOG:  received degenerate
> backend request for node_id: 0 from pid [31449]
> >> 2021-05-04 12:12:14: main pid 31221: LOG:  Pgpool-II parent process has
> received failover request
> >> 2021-05-04 12:12:14: main pid 31221: LOG:  starting degeneration.
> shutdown host /tmp(11002)
> >> 2021-05-04 12:12:14: pcp_main pid 31260: LOG:  PCP process with pid:
> 31449 exit with SUCCESS.
> >> 2021-05-04 12:12:14: pcp_main pid 31260: LOG:  PCP process with pid:
> 31449 exits with status 0
> >> 2021-05-04 12:12:14: main pid 31221: LOG:  Restart all children
> >> 2021-05-04 12:12:14: main pid 31221: LOG:  execute command:
> /home/t-ishii/work/Pgpool-II/current/x/etc/failover.sh 0 /tmp 11002
> /home/t-ishii/work/Pgpool-II/current/x/data0 1 0 /tmp 0 11003
> /home/t-ishii/work/Pgpool-II/current/x/data1
> >>
> >> However detach_false_primary found that the just promoted node 1 is
> >> not good because it does not have any follower standby node because
> >> follow_primary_command did not completed yet.
> >>
> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:
> verify_backend_node_status: primary 1 does not connect to standby 2
> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:
> verify_backend_node_status: primary 1 owns only 0 standbys out of 1
> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:
> pgpool_worker_child: invalid node found 1
> >>
> >> And detach_false_primary sent failover request for node 1.
> >>
> >> 2021-05-04 12:12:14: sr_check_worker pid 31261: LOG:  received
> degenerate backend request for node_id: 1 from pid [31261]
> >>
> >> Moreover every 1 second detach_false_primary tries to detach node 1.
> >>
> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:
> verify_backend_node_status: primary 1 does not connect to standby 2
> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:
> verify_backend_node_status: primary 1 owns only 0 standbys out of 1
> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:
> pgpool_worker_child: invalid node found 1
> >> 2021-05-04 12:12:15: sr_check_worker pid 31261: LOG:  received
> degenerate backend request for node_id: 1 from pid [31261]
> >>
> >> The confuses the whole follow_primary_command and in the end we have:
> >>
> >> psql -p 11000 -c "show pool_nodes" test
> >>  node_id | hostname | port  | status | pg_status | lb_weight |  role
>  | pg_role | select_cnt | load_balance_node | replication_delay |
> replication_state | replication_sync_state | last_status_change
> >>
> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >>  0       | /tmp     | 11002 | down   | down      | 0.333333  | standby
> | unknown | 0          | false             | 0                 |
>        |                        | 2021-05-04 12:12:16
> >>  1       | /tmp     | 11003 | up     | up        | 0.333333  | standby
> | standby | 0          | false             | 0                 |
>        |                        | 2021-05-04 12:22:28
> >>  2       | /tmp     | 11004 | up     | up        | 0.333333  | standby
> | standby | 0          | true              | 0                 |
>        |                        | 2021-05-04 12:22:28
> >> (3 rows)
> >>
> >> Of course this is totally unwanted result.
> >>
> >> I think the root cause of the problem is, detach_false_primary and
> >> follow_primary_command are allowed to run concurrently. To solve the
> >> problem we need to have a lock so that if detach_false_primary already
> >> runs, follow_primary_command should wait for it's completion or vice
> >> versa.
> >>
> >> For this purpose I propose attached patch
> >> detach_false_primary_v2.diff. In the patch new function
> >> pool_acquire_follow_primary_lock(bool block) and
> >> pool_release_follow_primary_lock(void) are introduced. They are
> >> responsible for acquiring or releasing the lock. There are 3 places
> >> where those functions are used:
> >>
> >> 1) find_primary_node
> >>
> >> This function is called upon startup and failover in the main pgpool
> >> process to find new primary node.
> >>
> >> 2) failover
> >>
> >> This function is called in the follow_primary_command subprocess
> >> forked off by pgpool main process to execute follow_primary_command
> >> script. The lock should be help until all follow_primary_command are
> >> completed.
> >>
> >> 3) streaming replication check
> >>
> >> Before starting verify_backend_node, which is the work horse of
> >> detach_false_primary, the lock must be acquired. If it fails, just
> >> skip the streaming replication check cycle.
> >>
> >>
> >> I and the user who made the initial report confirmed that tha patch
> >> works well.
> >>
> >> Unfortunately the story is not the all. However the mail is already
> >> too long. I will continue to the next mail.
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20210616/4386a420/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wd_coordinating_follow_and_detach__primary.patch
Type: application/octet-stream
Size: 17908 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20210616/4386a420/attachment-0001.obj>