View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000622 | Pgpool-II | Bug | public | 2020-06-21 23:29 | 2020-09-01 09:42 |
| Reporter | eldad | Assigned To | pengbo | ||
| Priority | high | Severity | major | Reproducibility | always |
| Status | closed | Resolution | open | ||
| Platform | AWS | OS | Linux | OS Version | CentOS 7 |
| Product Version | 4.1.2 | ||||
| Summary | 0000622: wd_escalation_command script not called when master pgpool shutdown | ||||
| Description | Hi, I'm using latest pgpool 4.1.2 woth PostgreSQL 12.3 on a 2 nodes cluster with streaming replication. I'm testing HA after the cluster setup and since I'm on AWS I'm using only wd_escalation_command which move c-name between the nodes in case the master pgpool fails. Problem is that the script is not called on the slave pgpool when I shutdown the master pgpool, although all the failover process is working fine. It worked without issues on older pgpool versions. Regards, Eldad | ||||
| Steps To Reproduce | install 2 nodes cluster and stop pgpool service on master node. | ||||
| Additional Information | before stopping pgpool on node2 (master) Watchdog Cluster Information Total Nodes : 2 Remote Nodes : 1 Quorum state : QUORUM EXIST Alive Remote Nodes : 1 VIP up on local node : NO Master Node Name : lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab Master Host Name : lreorgstgpdb02 Watchdog Node Information Node Name : lreorgstgpdb01:5400 Linux lreorgstgpdb01.ast.lab Host Name : lreorgstgpdb01 Delegate IP : Not_Set Pgpool port : 5400 Watchdog port : 9000 Node priority : 1 Status : 7 Status Name : STANDBY Node Name : lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab Host Name : lreorgstgpdb02 Delegate IP : Not_Set Pgpool port : 5400 Watchdog port : 9000 Node priority : 1 Status : 4 Status Name : MASTER ======================================================= after stopping pgpool service on master: Watchdog Cluster Information Total Nodes : 2 Remote Nodes : 1 Quorum state : QUORUM ABSENT Alive Remote Nodes : 0 VIP up on local node : NO Master Node Name : lreorgstgpdb01:5400 Linux lreorgstgpdb01.ast.lab Master Host Name : lreorgstgpdb01 Watchdog Node Information Node Name : lreorgstgpdb01:5400 Linux lreorgstgpdb01.ast.lab Host Name : lreorgstgpdb01 Delegate IP : Not_Set Pgpool port : 5400 Watchdog port : 9000 Node priority : 1 Status : 4 Status Name : MASTER Node Name : lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab Host Name : lreorgstgpdb02 Delegate IP : Not_Set Pgpool port : 5400 Watchdog port : 9000 Node priority : 1 Status : 10 Status Name : SHUTDOWN | ||||
| Tags | watchdog | ||||
|
|
attaching pgpool.cong |
|
|
fixed one |
|
|
example with logs: both pgpool are stopped, DB is up on both nodes. pgpool on node 2 is started , script didn't executed. waiting for few min. only once I start pgpool on node 1 too it start the escalation process and the script is executed. Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23513: LOG: escalation process started with PID:23880 Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23511: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23880: LOG: watchdog: escalation started Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23513: LOG: new IPC connection received Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23511: LOG: watchdog cluster now holds the quorum Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23511: DETAIL: updating the state of quarantine backend nodes Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23513: LOG: new IPC connection received Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23880: LOG: watchdog escalation successful Jun 21 22:43:33 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:43:33: pid 23513: LOG: watchdog escalation process with pid: 23880 exit with SUCCESS. Jun 21 22:44:16 lreorgstgpdb02.ast.lab pgpool[23511]: 2020-06-21 22:44:16: pid 23525: LOG: watchdog: lifecheck started wait few min. now stopping pgpool on node 2 (master). Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: remote node "lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab" is shutting down Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: watchdog cluster has lost the coordinator node Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: removing the remote node "lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab" from watchdog cluster master Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: We have lost the cluster master node "lreorgstgpdb02:5400 Linux lreorgstgpdb02.ast.lab" Jun 21 22:48:12 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:12: pid 28908: LOG: watchdog node state changed from [STANDBY] to [JOINING] Jun 21 22:48:16 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:16: pid 28908: LOG: watchdog node state changed from [JOINING] to [INITIALIZING] Jun 21 22:48:17 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:17: pid 28908: LOG: I am the only alive node in the watchdog cluster Jun 21 22:48:17 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:17: pid 28908: HINT: skipping stand for coordinator state Jun 21 22:48:17 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:17: pid 28908: LOG: watchdog node state changed from [INITIALIZING] to [MASTER] Jun 21 22:48:17 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:17: pid 28908: LOG: I am announcing my self as master/coordinator watchdog node Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: I am the cluster leader node Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: DETAIL: our declare coordinator message is accepted by all nodes Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: setting the local node "lreorgstgpdb01:5400 Linux lreorgstgpdb01.ast.lab" as watchdog cluster master Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: I am the cluster leader node but we do not have enough nodes in cluster Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: DETAIL: waiting for the quorum to start escalation process Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28906: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: new IPC connection received Jun 21 22:48:21 lreorgstgpdb01.ast.lab pgpool[28906]: 2020-06-21 22:48:21: pid 28908: LOG: new IPC connection received node 1 become master, but no escalation and script is not executed. "DETAIL: waiting for the quorum to start escalation process" |
|
|
If you use 2 pgpool nodes, you need to enable "enable_consensus_with_half_votes" paramater. https://www.pgpool.net/docs/latest/en/html/runtime-watchdog-config.html#GUC-ENABLE-CONSENSUS-WITH-HALF-VOTES |
|
|
Thanks, this indeed solved the problem. |
|
|
Is there a way to spool the output to a log file as it was in older versions on centOS 6? currently all the log is in the journal and i like to have it in /etc/pgpool/pgpool.log |
|
|
You can set to "syslog". ----------------------- log_destination = 'syslog' ----------------------- See example: https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG-LOG |
|
|
Many thanks! |
|
|
I am going to mark this issue as "resolved". |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2020-06-21 23:29 | eldad | New Issue | |
| 2020-06-21 23:29 | eldad | Tag Attached: watchdog | |
| 2020-06-22 14:32 | eldad | File Added: pgpool.conf | |
| 2020-06-22 14:32 | eldad | Note Added: 0003410 | |
| 2020-06-22 14:42 | eldad | File Added: pgpool-2.conf | |
| 2020-06-22 14:42 | eldad | Note Added: 0003411 | |
| 2020-06-22 14:52 | eldad | Note Added: 0003412 | |
| 2020-06-22 17:26 | pengbo | Note Added: 0003414 | |
| 2020-06-22 17:26 | pengbo | Assigned To | => pengbo |
| 2020-06-22 17:26 | pengbo | Status | new => feedback |
| 2020-06-22 18:25 | eldad | Note Added: 0003415 | |
| 2020-06-22 18:25 | eldad | Status | feedback => assigned |
| 2020-06-22 18:45 | eldad | Note Added: 0003416 | |
| 2020-06-23 00:51 | pengbo | Note Added: 0003417 | |
| 2020-06-23 00:52 | pengbo | Status | assigned => feedback |
| 2020-06-23 22:46 | eldad | Note Added: 0003420 | |
| 2020-06-23 22:46 | eldad | Status | feedback => assigned |
| 2020-06-23 23:50 | pengbo | Note Added: 0003422 | |
| 2020-06-23 23:51 | pengbo | Status | assigned => resolved |
| 2020-09-01 09:42 | administrator | Status | resolved => closed |