[pgpool-hackers: 3443] Re: [pgpool-committers: 6195] pgpool: Overhaul health check debug facility.

Wed Sep 25 21:24:20 JST 2019

The buildfarm result of 2019/9/25 showed that
013.watchdog_failover_require_consensus test on 4.0 branch has
succeeded. So I have pushed the commit bit to the 3.7 branch as well.

> This is the attempt to fix 013.watchdog_failover_require_consensus
> failure on 4.0 branch. If this succeeds, I will apply it to 3.7 branch
> as well.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> Subject: [pgpool-committers: 6195] pgpool: Overhaul health check debug facility.
> Date: Tue, 24 Sep 2019 05:58:25 +0000
> Message-ID: <E1iCdqD-0000tu-DT at gothos.postgresql.org>
> 
>> Overhaul health check debug facility.
>> 
>> check_backend_down_request() in health_check.c is intended to simulate
>> the situation where communication failure between health check and
>> PostgreSQL backend node by creating a file containing lines:
>> 
>> 1       down
>> 
>> where the first numeric is the node id starting from 0, tab, and
>> "down". When health check process finds the file, let health check
>> fails on node 1.
>> 
>> After health check brings the node into down status,
>> check_backend_down_request() change "down" to "already_down" to
>> prevent repeating node failure.
>> 
>> However, questions is, this is necessary at all. I think
>> check_backend_down_request() should keep on reporting the down status
>> and it should be called inside establish_persistent_connection() to
>> prevent repeating node failure because it could be better simulated
>> the failing situation in this way. For example, currently the health
>> check retry is not simulated but the new way can do it.
>> 
>> Moreover, in current watchdog implementation, to bring a node into
>> quarantine state requires *two" times of node communication error
>> detection. Since check_backend_down_request() only allows to raise
>> node down even *once" (after the down state is changed to already_down
>> state), it's impossible to test the watchdog quarantine using
>> check_backend_down_request(). I changed check_backend_down_request()
>> so that it continues to raise "down" event as long as the down request
>> file exists.
>> 
>> This commit enhances check_backend_down_request() as described above.
>> 
>> 1) caller of check_backend_down_request() is
>>    establish_persistent_connection(), rather than
>>    do_health_check_child().
>> 
>> 2) check_backend_down_request() does not change "down" to
>>    "already_down" anymore. This means that the second argument of
>>    check_backend_down_request() is not useful anymore. Probably I
>>    should remove the argument later on.
>> 
>> Branch
>> ------
>> V4_0_STABLE
>> 
>> Details
>> -------
>> https://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=2d2718929c5648ec6303ebd149e3eadeff1b4f19
>> 
>> Modified Files
>> --------------
>> src/main/health_check.c | 19 ++++++++++++-------
>> 1 file changed, 12 insertions(+), 7 deletions(-)
>> 
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers