View Issue Details

IDProjectCategoryView StatusLast Update
0000622Pgpool-IIBugpublic2020-06-23 23:51
ReportereldadAssigned Topengbo 
PriorityhighSeveritymajorReproducibilityalways
Status resolvedResolutionopen 
PlatformAWSOSLinuxOS VersionCentOS 7
Product Version4.1.2 
Target VersionFixed in Version 
Summary0000622: wd_escalation_command script not called when master pgpool shutdown
DescriptionHi,

I'm using latest pgpool 4.1.2 woth PostgreSQL 12.3 on a 2 nodes cluster with streaming replication.
I'm testing HA after the cluster setup and since I'm on AWS I'm using only
wd_escalation_command which move c-name between the nodes in case the master pgpool fails.
Problem is that the script is not called on the slave pgpool when I shutdown the master pgpool, although all
the failover process is working fine.
It worked without issues on older pgpool versions.

Regards,
Eldad
Steps To Reproduceinstall 2 nodes cluster and stop pgpool service on master node.
Additional Informationbefore stopping pgpool on node2 (master)
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : NO
Master Node Name : lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab
Master Host Name : lreorgstgpdb02

Watchdog Node Information
Node Name : lreorgstgpdb01:5400 Linux lreorgstgpdb01.ast.lab
Host Name : lreorgstgpdb01
Delegate IP : Not_Set
Pgpool port : 5400
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

Node Name : lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab
Host Name : lreorgstgpdb02
Delegate IP : Not_Set
Pgpool port : 5400
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

=======================================================

after stopping pgpool service on master:
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM ABSENT
Alive Remote Nodes : 0
VIP up on local node : NO
Master Node Name : lreorgstgpdb01:5400 Linux lreorgstgpdb01.ast.lab
Master Host Name : lreorgstgpdb01

Watchdog Node Information
Node Name : lreorgstgpdb01:5400 Linux lreorgstgpdb01.ast.lab
Host Name : lreorgstgpdb01
Delegate IP : Not_Set
Pgpool port : 5400
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

Node Name : lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab
Host Name : lreorgstgpdb02
Delegate IP : Not_Set
Pgpool port : 5400
Watchdog port : 9000
Node priority : 1
Status : 10
Status Name : SHUTDOWN

Tagswatchdog

Activities

eldad

2020-06-22 14:32

reporter   ~0003410

attaching pgpool.cong

pgpool.conf (43,428 bytes)

eldad

2020-06-22 14:42

reporter   ~0003411

fixed one

pgpool-2.conf (43,431 bytes)

eldad

2020-06-22 14:52

reporter   ~0003412

example with logs:
both pgpool are stopped, DB is up on both nodes.
pgpool on node 2 is started , script didn't executed.
waiting for few min.
only once I start pgpool on node 1 too it start the escalation process and the script is executed.
Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23513: LOG: escalation process started with PID:23880
Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23511: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23880: LOG: watchdog: escalation started
Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23513: LOG: new IPC connection received
Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23511: LOG: watchdog cluster now holds the quorum
Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23511: DETAIL: updating the state of quarantine backend nodes
Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23513: LOG: new IPC connection received
Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23880: LOG: watchdog escalation successful
Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23513: LOG: watchdog escalation process with pid: 23880 exit with SUCCESS.
Jun 21 22:44:16 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:44:16: pid 23525: LOG: watchdog: lifecheck started

wait few min.
now stopping pgpool on node 2 (master).
Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: remote node "lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab" is shutting down
Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: watchdog cluster has lost the coordinator node
Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: removing the remote node "lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab" from watchdog cluster master
Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: We have lost the cluster master node "lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab"
Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: watchdog node state changed from [STANDBY] to [JOINING]
Jun 21 22:48:16 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:16: pid 28908: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
Jun 21 22:48:17 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:17: pid 28908: LOG: I am the only alive node in the watchdog cluster
Jun 21 22:48:17 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:17: pid 28908: HINT: skipping stand for coordinator state
Jun 21 22:48:17 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:17: pid 28908: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
Jun 21 22:48:17 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:17: pid 28908: LOG: I am announcing my self as master/coordinator watchdog node
Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: I am the cluster leader node
Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: DETAIL: our declare coordinator message is accepted by all nodes
Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: setting the local node "lreorgstgpdb01:5400 Linux lreorgstgpdb01.ast.lab" as watchdog cluster master
Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: I am the cluster leader node but we do not have enough nodes in cluster
Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: DETAIL: waiting for the quorum to start escalation process
Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28906: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: new IPC connection received
Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: new IPC connection received

node 1 become master, but no escalation and script is not executed.
"DETAIL: waiting for the quorum to start escalation process"

pengbo

2020-06-22 17:26

developer   ~0003414

If you use 2 pgpool nodes, you need to enable "enable_consensus_with_half_votes" paramater.

https://www.pgpool.net/docs/latest/en/html/runtime-watchdog-config.html#GUC-ENABLE-CONSENSUS-WITH-HALF-VOTES

eldad

2020-06-22 18:25

reporter   ~0003415

Thanks, this indeed solved the problem.

eldad

2020-06-22 18:45

reporter   ~0003416

Is there a way to spool the output to a log file as it was in older versions on centOS 6?
currently all the log is in the journal and i like to have it in /etc/pgpool/pgpool.log

pengbo

2020-06-23 00:51

developer   ~0003417

You can set to "syslog".
-----------------------
log_destination = 'syslog'
-----------------------

See example:
https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG-LOG

eldad

2020-06-23 22:46

reporter   ~0003420

Many thanks!

pengbo

2020-06-23 23:50

developer   ~0003422

I am going to mark this issue as "resolved".

Issue History

Date Modified Username Field Change
2020-06-21 23:29 eldad New Issue
2020-06-21 23:29 eldad Tag Attached: watchdog
2020-06-22 14:32 eldad File Added: pgpool.conf
2020-06-22 14:32 eldad Note Added: 0003410
2020-06-22 14:42 eldad File Added: pgpool-2.conf
2020-06-22 14:42 eldad Note Added: 0003411
2020-06-22 14:52 eldad Note Added: 0003412
2020-06-22 17:26 pengbo Note Added: 0003414
2020-06-22 17:26 pengbo Assigned To => pengbo
2020-06-22 17:26 pengbo Status new => feedback
2020-06-22 18:25 eldad Note Added: 0003415
2020-06-22 18:25 eldad Status feedback => assigned
2020-06-22 18:45 eldad Note Added: 0003416
2020-06-23 00:51 pengbo Note Added: 0003417
2020-06-23 00:52 pengbo Status assigned => feedback
2020-06-23 22:46 eldad Note Added: 0003420
2020-06-23 22:46 eldad Status feedback => assigned
2020-06-23 23:50 pengbo Note Added: 0003422
2020-06-23 23:51 pengbo Status assigned => resolved