[pgpool-general: 6408] Re: health_check_max_retries is not honored

Tue Feb 12 09:59:36 JST 2019

Hi,

On Mon, 11 Feb 2019 15:32:55 -0600
Alexander Dorogensky <amazinglifetime at gmail.com> wrote:

> I'm running 4 app (pgpool) nodes (3.6.10) and 2 db (postgres) nodes (9.6.9)
> primary/standby configuration with streaming replication. All 6 nodes are
> separate machines.
> 
> A client has had too many failovers caused by the flaky network and in an
> effort to remedy the issue I set the following parameters
> 
> health_check_max_retries = 7
> health_check_retry_delay = 15
> 
> Now, I have the client's environment and a lab environment to reproduce the
> issue. Pgpool configuration and the version are identical.
> 
> To simulate a flaky network, I use iptables to deny postgres connections to
> one of the db nodes and see that pgpool on all app nodes is trying to
> reconnect according to the configured number of retries and retry delay,
> 
> > i.e.
> > 2019-02-11 14:22:51: pid 7825: LOG:  failed to connect to PostgreSQL
> > server on "10.0.10.133:5433", getsockopt() detected error "No route to
> > host"
> > ...
> > 2019-02-11 14:23:23: pid 6458: LOG:  health checking retry count 1
> > ...
> > 2019-02-11 14:23:38: pid 6458: LOG:  health checking retry count 2
> > ...
> > 2019-02-11 14:42:45: pid 6458: LOG:  health checking retry count 3
> > ...
> > 2019-02-11 14:43:00: pid 6458: LOG:  health checking retry count 4
> > ...
> > 2019-02-11 14:43:15: pid 6458: LOG:  health checking retry count 5
> > ...
> > 2019-02-11 14:43:30: pid 6458: LOG:  health checking retry count 6
> > ...
> > 2019-02-11 14:43:30: pid 6460: LOG:  failover request from local pgpool-II
> > node received on IPC interface is forwarded to master watchdog node "
> > 172.20.20.173:5432"
> > 2019-02-11 14:43:30: pid 4565: LOG:  watchdog received the failover
> > command from remote pgpool-II node "172.20.20.172:5432"
> > ...
> > 2019-02-11 14:43:30: pid 4563: LOG:  execute command:
> > /etc/pgpool-II/failover.sh 0 10.0.10.133 5433 /opt/redsky/db/data 1 0
> > 10.0.10.134 1 5433 /opt/redsky/db/data
> >
> > However, in the client's environment failover gets initiated before the
> configured number of retries, i.e.
> 
> 2019-02-09 05:17:47: pid 19402: LOG:  watchdog received the failover
> > command from local pgpool-II on IPC interface
> > 2019-02-09 05:17:47: pid 19402: LOG:  watchdog is processing the failover
> > command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC
> > interface
> > 2019-02-09 05:17:47: pid 19402: LOG:  forwarding the failover request
> > [DEGENERATE_BACKEND_REQUEST] to all alive nodes
> > 2019-02-09 05:17:47: pid 19402: DETAIL:  watchdog cluster currently has 3
> > connected remote nodes
> > 2019-02-09 05:17:47: pid 19276: ERROR:  unable to read data from DB node 1
> > 2019-02-09 05:17:47: pid 19276: DETAIL:  socket read failed with an error
> > "Success"
> > 2019-02-09 05:17:47: pid 19400: LOG:  Pgpool-II parent process has
> > received failover request
> > 2019-02-09 05:17:47: pid 19402: LOG:  new IPC connection received
> > 2019-02-09 05:17:47: pid 19402: LOG:  received the failover command lock
> > request from local pgpool-II on IPC interface
> > 2019-02-09 05:17:47: pid 19402: LOG:  local pgpool-II node "
> > 10.15.35.35:5432" is requesting to become a lock holder for failover ID:
> > 19880
> > 2019-02-09 05:17:47: pid 19402: LOG:  local pgpool-II node "
> > 10.15.35.35:5432" is the lock holder
> > 2019-02-09 05:17:47: pid 19400: LOG:  starting degeneration. shutdown host
> > 10.38.135.137(5433)
> > 2019-02-09 05:17:47: pid 19400: LOG:  Restart all children
> > 2019-02-09 05:17:47: pid 19400: LOG:  execute command:
> > /etc/pgpool-II/failover.sh 1 10.38.135.137 5433 /opt/redsky/db/data 0 0
> > 10.15.35.39 1 5433 /opt/redsky/db/data
> >
> >
> I ran the following command on all app nodes
> 
> psql -c 'pgpool show health_check_max_retries'
> health_check_max_retries
> --------------------------
> 16
> (1 row)
> 
> and the number is different from what I have in the configuration file..
> It's more than 1 though and I expect it to be honored.

I could not reproduce this issue by using pgpool_setup.
Could you share the whole pgpool.conf?

> Can you guys help me out? I'm out of ideas..
> 
> pgpool-II-pg96-3.6.10-1pgdg.rhel6.x86_64

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan