[pgpool-hackers: 4243] Re: Issue with failover_require_consensus

Muhammad Usama muhammad.usama at percona.com
Fri Dec 16 20:12:39 JST 2022


On Tue, Nov 29, 2022 at 3:27 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> >> Hi Ishii-San
> >>
> >> Sorry for the delayed response.
> >
> > No problem.
> >
> >> With the attached fix I guess the failover objects will linger on
> forever
> >> in case of a false alarm by a health check or small glitch.
> >
> > That's not good.
> >
> >> One way to get around the issue could be to compute
> >> FAILOVER_COMMAND_FINISH_TIMEOUT based on the maximum value
> >> of health_check_peroid across the cluster.
> >> something like: failover_command_finish_timouut =
> max(health_check_period)
> >> * 2 = 60
>
> After thinking more, I think we need to take account
> health_check_max_retries and health_check_retry_delay as
> well. i.e. instead of max(health_check_period), something like:
> max(health_check_period + (health_check_retry_delay *
> health_check_max_retries)).
>
> What do you think?
>

Thanks for the valuable suggestions.
Can you try out the attached patch to see if it solves the issue?

Best regards
Muhammad Usama


> > This is much better than my previous proposal.
> >
> >> If you agree with the proposal I can cook up the patch and share it with
> >> you.
> >
> > I agree with you. Please go ahead.
> >
> >> Thanks
> >> Best regards
> >> Muhammad Usama
> >>
> >> On Mon, Nov 21, 2022 at 3:38 PM Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >>
> >>> > Hi Usama,
> >>> >
> >>> > I think I found an issue with failover_require_consensus. When this
> >>> > parameter is enabled, watchdog asks other watchdog to confirm the
> >>> > failover event. If other watchdog replies back and the originator
> >>> > watchdog can make a consensus, failover process begins. However, if
> no
> >>> > replies arrive before FAILOVER_COMMAND_FINISH_TIMEOUT expires, the
> >>> > failover request is discarded and failover will not begin. If this
> >>> > only happens once or twice, we could expect that subsequent health
> >>> > check would trigger failover. But actually this could repeat forever
> >>> > (which means failover never happens) if health_check_period is larger
> >>> > than FAILOVER_COMMAND_FINISH_TIMEOUT (currently 15 seconds). For
> >>> > example, if health_check_period = 30 seconds, and other watchdog node
> >>> > 1 starts 50 seconds after watchdog node 0 (suppose this is the leader
> >>> > node), then every time failover consensus request is made (suppose
> the
> >>> > time is t), it will be canceled at t + 15, because failover on
> >>> > watchdog node 1 will happen at time t + 20 ( = 50 - 30).
> >>> >
> >>> > Since we allow other watchdog node joins a watchdog cluster anytime,
> I
> >>> > think this is not a behavior we expect.
> >>> >
> >>> > Can we make FAILOVER_COMMAND_FINISH_TIMEOUT longer or disable the
> >>> > expiring when failover_require_consensus is on?
> >>>
> >>> Attached is the patch for this.
> >>>
> >>> > disable the
> >>> > expiring when failover_require_consensus is on?
> >>>
> >>> It seems the patch solves the issue and passed all of regression
> >>> test. But I wonder if the patch will give unwanted side effects. What
> >>> do you think?
> >>>
> >>> Best reagards,
> >>> --
> >>> Tatsuo Ishii
> >>> SRA OSS LLC
> >>> English: http://www.sraoss.co.jp/index_en/
> >>> Japanese:http://www.sraoss.co.jp
> >>>
> > _______________________________________________
> > pgpool-hackers mailing list
> > pgpool-hackers at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20221216/4eb4efe4/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_failover_require_consensus.diff
Type: application/octet-stream
Size: 3187 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20221216/4eb4efe4/attachment-0001.obj>


More information about the pgpool-hackers mailing list