[pgpool-general: 6481] Re: pgpool-II does nothing when slave node goes down

Thu Mar 28 00:09:18 JST 2019

Dimitri, 

it should really work like you think, when a standby fails then pgool detaches it from the pool and the status should be down (can be seen with show pool_nodes).  So my explanation was not clear: what I mean is that when a standby fails the script failover.sh is executed but in the script there is nothing to do. pgpool itself will take care to detach the failing node from the pool.

Here is an extract from my pgpool log fil when I stopped the standby database called pg02. I have the following parameters:
health_check_period=5
health_check_max_retries=3health_check_retry_delay=1failover_on_backend_error='off'
As you can see the health check retry 3 times on DB node 1 (the standby node), then it executes the failover command. In my failover script I check if the falling node is the primary, if it is not then I do nothing. After that show pool_nodes says the node pg02 is down 

It is not working for you, there must be a misconfiguration
Pierre

pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:42: pid 188: LOG:  failed to connect to PostgreSQL server on "pg02:5432", getsockopt() detected error "Connection refused"
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:42: pid 188: LOCATION:  pool_connection_pool.c:680
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:42: pid 188: ERROR:  failed to make persistent db connection
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:42: pid 188: DETAIL:  connection to host:"pg02:5432" failed
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:42: pid 188: LOCATION:  child.c:1328
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:42: pid 188: LOG:  health check retrying on DB node: 1 (round:1)
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:42: pid 188: LOCATION:  health_check.c:298
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:43: pid 188: LOG:  failed to connect to PostgreSQL server on "pg02:5432", getsockopt() detected error "Connection refused"
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:43: pid 188: LOCATION:  pool_connection_pool.c:680
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:43: pid 188: ERROR:  failed to make persistent db connection
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:43: pid 188: DETAIL:  connection to host:"pg02:5432" failed
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:43: pid 188: LOCATION:  child.c:1328
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:43: pid 188: LOG:  health check retrying on DB node: 1 (round:2)
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:43: pid 188: LOCATION:  health_check.c:298
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:44: pid 188: LOG:  failed to connect to PostgreSQL server on "pg02:5432", getsockopt() detected error "Connection refused"
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:44: pid 188: LOCATION:  pool_connection_pool.c:680
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:44: pid 188: ERROR:  failed to make persistent db connection
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:44: pid 188: DETAIL:  connection to host:"pg02:5432" failed
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:44: pid 188: LOCATION:  child.c:1328
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:44: pid 188: LOG:  health check retrying on DB node: 1 (round:3)
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:44: pid 188: LOCATION:  health_check.c:298
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 188: LOG:  failed to connect to PostgreSQL server on "pg02:5432", getsockopt() detected error "Connection refused"
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 188: LOCATION:  pool_connection_pool.c:680
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 188: ERROR:  failed to make persistent db connection
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 188: DETAIL:  connection to host:"pg02:5432" failed
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 188: LOCATION:  child.c:1328
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 188: LOG:  health check failed on node 1 (timeout:0)
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 188: LOCATION:  health_check.c:201
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 188: LOG:  received degenerate backend request for node_id: 1 from pid [188]
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 188: LOCATION:  pgpool_main.c:1125
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOG:  Pgpool-II parent process has received failover request
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOCATION:  pgpool_main.c:1588
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOG:  starting degeneration. shutdown host pg02(5432)
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOCATION:  pgpool_main.c:1867
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOG:  Do not restart children because we are switching over node id 1 host: pg02 port: 5432 and we are in streaming replication mode
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOCATION:  pgpool_main.c:1974
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOG:  execute command: /scripts/failover.sh 1 pg02 5432 /data 0 pg01 0 0 5432 /data
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOCATION:  pgpool_main.c:3064
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 1 pg02 5432 /data 0 pg01 0 0 5432 /data
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | Wed Mar 27 14:44:45 UTC 2019
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | FALLING_NODE: 1
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | FALLING_HOST: pg02
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | FALLING_PORT_NUMBER: 5432
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | FALLING_CLUSTER_PATH: /data
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | NEW_MASTER_ID: 0
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | NEW_HOST: pg01
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | OLD_MASTER_ID: 0
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | OLD_PRIMARY_ID: 0
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | NEW_PORT: 5432
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | NEW_CLUSTER_PATH: 10
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | + '[' 1 = 0 ']'
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | + echo old primary id is 0 and falling node is 1
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | old primary id is 0 and falling node is 1
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | + exit 0
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOG:  failover: set new primary node: 0
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOCATION:  pgpool_main.c:2187
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOG:  failover: set new master node: 0
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOCATION:  pgpool_main.c:2194
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | failover done. shutdown host pg02(5432)2019-03-27 14:44:45: pid 1: LOG:  failover done. shutdown host pg02(5432)
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 1: LOCATION:  pgpool_main.c:2317
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 186: LOG:  worker process received restart request
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:45: pid 186: LOCATION:  pool_worker_child.c:153
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 185: LOG:  restart request received in pcp child process
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 185: LOCATION:  pcp_child.c:155
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 1: LOG:  PCP child 185 exits with status 0 in failover()
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 1: LOCATION:  pgpool_main.c:2359
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 1: LOG:  fork a new PCP child pid 235 in failover()
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 1: LOCATION:  pgpool_main.c:2363
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 1: LOG:  worker child process with pid: 186 exits with status 256
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 1: LOCATION:  pgpool_main.c:2620
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 1: LOG:  fork a new worker child process with pid: 236
pgcluster_pgpool.1.yg0sbf4dywfe at romero    | 2019-03-27 14:44:46: pid 1: LOCATION:  pgpool_main.c:2729

Pierre 

    On Wednesday, March 27, 2019, 3:26:30 PM GMT+1, Dmitry Medvedev <dm.dm.medvedev at gmail.com> wrote:  

 Thanks a lot, very usefull and detailed manual. My config (at least at this moment) is simplier - I have only one instance of pgpool and 1 master and 1 standby.
You wrote "pgpool execute the failover_command in case there is a failure of the master or a failure of one of the standby (at least when health checks are used). In the failover script you will usually wants to check (based on the arguments) what to do: if the failing host is the current primary then you would promote the standby otherwise the script does nothing".
Imagine, that stand-by node is failed. If pgpool does nothing -> in this case we have failed redundancy -> pgpool thinks that we have stand-by node in reserve -> however stand-by node is down and there is no reserve.
So, maybe, the best way will be use such tools as Zabbix for monitoring and use PgPool only for 2 things:1) connection pooling2) switching to standby on master failure  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190327/19d4ad15/attachment-0001.html>