[pgpool-hackers: 4244] Re: Issue with failover_require_consensus

Tatsuo Ishii ishii at sraoss.co.jp
Fri Dec 16 22:43:40 JST 2022


> On Tue, Nov 29, 2022 at 3:27 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> >> Hi Ishii-San
>> >>
>> >> Sorry for the delayed response.
>> >
>> > No problem.
>> >
>> >> With the attached fix I guess the failover objects will linger on
>> forever
>> >> in case of a false alarm by a health check or small glitch.
>> >
>> > That's not good.
>> >
>> >> One way to get around the issue could be to compute
>> >> FAILOVER_COMMAND_FINISH_TIMEOUT based on the maximum value
>> >> of health_check_peroid across the cluster.
>> >> something like: failover_command_finish_timouut =
>> max(health_check_period)
>> >> * 2 = 60
>>
>> After thinking more, I think we need to take account
>> health_check_max_retries and health_check_retry_delay as
>> well. i.e. instead of max(health_check_period), something like:
>> max(health_check_period + (health_check_retry_delay *
>> health_check_max_retries)).
>>
>> What do you think?
>>
> 
> Thanks for the valuable suggestions.
> Can you try out the attached patch to see if it solves the issue?

Unfortunately the patch did not pass my test case.

- 3 watchdog nodes and 2 PostgreSQL servers, streaming replication
  cluster (created by watchdog_setup). pgpool0 is the watchdog leader.

- health_check_period = 300, health_check_max_retries = 0

- pgpool1 starts 120 seconds after pgpool0 starts

- pgpool2 does not start

- after watchdog cluster becomes ready, shutdown PostgreSQL node 1 (standby).

- wait for 600 seconds to expect a failover.

Unfortunately failover did not happen.

Attached is the test script and pgpool0 log.

To run the test:

- unpack test.tar.gz

- run prepare.sh
  $ sh prepare.sh
  This should create "testdir" directory with 3 watchdog node + PostgreSQL 2 node cluster.

- cd testdir and run the test
  $ sh ../start.sg -o 120
  This will start the test, "-o" specifies how long wait before strating pgpool1.
  
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.tar.gz
Type: application/octet-stream
Size: 5462 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20221216/99de745a/attachment.obj>


More information about the pgpool-hackers mailing list