[pgpool-hackers: 3442] Re: [pgpool-committers: 6195] pgpool: Overhaul health check debug facility.

Tatsuo Ishii ishii at sraoss.co.jp
Tue Sep 24 15:01:19 JST 2019


This is the attempt to fix 013.watchdog_failover_require_consensus
failure on 4.0 branch. If this succeeds, I will apply it to 3.7 branch
as well.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

From: Tatsuo Ishii <ishii at sraoss.co.jp>
Subject: [pgpool-committers: 6195] pgpool: Overhaul health check debug facility.
Date: Tue, 24 Sep 2019 05:58:25 +0000
Message-ID: <E1iCdqD-0000tu-DT at gothos.postgresql.org>

> Overhaul health check debug facility.
> 
> check_backend_down_request() in health_check.c is intended to simulate
> the situation where communication failure between health check and
> PostgreSQL backend node by creating a file containing lines:
> 
> 1       down
> 
> where the first numeric is the node id starting from 0, tab, and
> "down". When health check process finds the file, let health check
> fails on node 1.
> 
> After health check brings the node into down status,
> check_backend_down_request() change "down" to "already_down" to
> prevent repeating node failure.
> 
> However, questions is, this is necessary at all. I think
> check_backend_down_request() should keep on reporting the down status
> and it should be called inside establish_persistent_connection() to
> prevent repeating node failure because it could be better simulated
> the failing situation in this way. For example, currently the health
> check retry is not simulated but the new way can do it.
> 
> Moreover, in current watchdog implementation, to bring a node into
> quarantine state requires *two" times of node communication error
> detection. Since check_backend_down_request() only allows to raise
> node down even *once" (after the down state is changed to already_down
> state), it's impossible to test the watchdog quarantine using
> check_backend_down_request(). I changed check_backend_down_request()
> so that it continues to raise "down" event as long as the down request
> file exists.
> 
> This commit enhances check_backend_down_request() as described above.
> 
> 1) caller of check_backend_down_request() is
>    establish_persistent_connection(), rather than
>    do_health_check_child().
> 
> 2) check_backend_down_request() does not change "down" to
>    "already_down" anymore. This means that the second argument of
>    check_backend_down_request() is not useful anymore. Probably I
>    should remove the argument later on.
> 
> Branch
> ------
> V4_0_STABLE
> 
> Details
> -------
> https://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=2d2718929c5648ec6303ebd149e3eadeff1b4f19
> 
> Modified Files
> --------------
> src/main/health_check.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
> 


More information about the pgpool-hackers mailing list