[pgpool-hackers: 4231] Re: Issue with failover_require_consensus

Muhammad Usama muhammad.usama at percona.com
Mon Nov 28 16:31:09 JST 2022


Hi Ishii-San

Sorry for the delayed response.
With the attached fix I guess the failover objects will linger on forever
in case of a false alarm by a health check or small glitch.
One way to get around the issue could be to compute
FAILOVER_COMMAND_FINISH_TIMEOUT based on the maximum value
of health_check_peroid across the cluster.
something like: failover_command_finish_timouut = max(health_check_period)
* 2 = 60

If you agree with the proposal I can cook up the patch and share it with
you.

Thanks
Best regards
Muhammad Usama

On Mon, Nov 21, 2022 at 3:38 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > Hi Usama,
> >
> > I think I found an issue with failover_require_consensus. When this
> > parameter is enabled, watchdog asks other watchdog to confirm the
> > failover event. If other watchdog replies back and the originator
> > watchdog can make a consensus, failover process begins. However, if no
> > replies arrive before FAILOVER_COMMAND_FINISH_TIMEOUT expires, the
> > failover request is discarded and failover will not begin. If this
> > only happens once or twice, we could expect that subsequent health
> > check would trigger failover. But actually this could repeat forever
> > (which means failover never happens) if health_check_period is larger
> > than FAILOVER_COMMAND_FINISH_TIMEOUT (currently 15 seconds). For
> > example, if health_check_period = 30 seconds, and other watchdog node
> > 1 starts 50 seconds after watchdog node 0 (suppose this is the leader
> > node), then every time failover consensus request is made (suppose the
> > time is t), it will be canceled at t + 15, because failover on
> > watchdog node 1 will happen at time t + 20 ( = 50 - 30).
> >
> > Since we allow other watchdog node joins a watchdog cluster anytime, I
> > think this is not a behavior we expect.
> >
> > Can we make FAILOVER_COMMAND_FINISH_TIMEOUT longer or disable the
> > expiring when failover_require_consensus is on?
>
> Attached is the patch for this.
>
> > disable the
> > expiring when failover_require_consensus is on?
>
> It seems the patch solves the issue and passed all of regression
> test. But I wonder if the patch will give unwanted side effects. What
> do you think?
>
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20221128/9a7ac134/attachment.htm>


More information about the pgpool-hackers mailing list