View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000574 | Pgpool-II | Bug | public | 2020-01-12 18:32 | 2020-01-13 08:22 |
| Reporter | raj.pandey1982@gmail.com | Assigned To | t-ishii | ||
| Priority | high | Severity | major | Reproducibility | always |
| Status | resolved | Resolution | open | ||
| Platform | Linux | OS | centos | OS Version | x86_64 x86_64 x8 |
| Product Version | 4.1.0 | ||||
| Summary | 0000574: waiting for the quorum to start escalation process | ||||
| Description | DB: Postgres:11.5 PGPOOL II 4.1.0 I have 2 postgres master-slave nodes each having pgpoolI 4.1.0 configured as Master and stand by as well and 1 virtual IP. when pgpool on node 1 or node 2 is made down the VIP get released but not getting acquired by other one. If node 1 made down and VIP gor released then Node 2 Log says:- 2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node but we do not have enough nodes in cluster 2020-01-12 11:56:52: pid 16567:DETAIL: waiting for the quorum to start escalation process Now when i start pgpool at node 1 again then only Node2 acquires the VIP. so main issue is other node not holding the QUORUM until unless i again start the previously stopped node as stand by. The old suggestion (#systemctl stop firewalld) i tried but did not help. Also i have an other instance of 4.0.1 with same kind of 2 nodes setup but IP release and acquiring is going thrugh well. Below is full log of node2 : 2020-01-12 11:43:09: pid 16565:LOG: waiting for watchdog to initialize 2020-01-12 11:43:09: pid 16567:LOG: setting the local watchdog node name to "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" 2020-01-12 11:43:09: pid 16567:LOG: watchdog cluster is configured with 1 remote nodes 2020-01-12 11:43:09: pid 16567:LOG: watchdog remote node:0 on pgpool-poc02.novalocal:9000 2020-01-12 11:43:09: pid 16567:LOG: interface monitoring is disabled in watchdog 2020-01-12 11:43:09: pid 16567:LOG: watchdog node state changed from [DEAD] to [LOADING] 2020-01-12 11:43:09: pid 16567:LOG: new outbound connection to pgpool-poc02.novalocal:9000 2020-01-12 11:43:09: pid 16567:LOG: setting the remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" as watchdog cluster master 2020-01-12 11:43:09: pid 16567:LOG: watchdog node state changed from [LOADING] to [INITIALIZING] 2020-01-12 11:43:09: pid 16567:LOG: new watchdog node connection is received from "10.70.184.28:25794" 2020-01-12 11:43:09: pid 16567:LOG: new node joined the cluster hostname:"pgpool-poc02.novalocal" port:9000 pgpool_port:5433 2020-01-12 11:43:09: pid 16567:DETAIL: Pgpool-II version:"4.1.0" watchdog messaging version: 1.1 2020-01-12 11:43:10: pid 16567:LOG: watchdog node state changed from [INITIALIZING] to [STANDBY] 2020-01-12 11:43:10: pid 16567:LOG: successfully joined the watchdog cluster as standby node 2020-01-12 11:43:10: pid 16567:DETAIL: our join coordinator request is accepted by cluster leader node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" 2020-01-12 11:43:10: pid 16565:LOG: watchdog process is initialized 2020-01-12 11:43:10: pid 16565:DETAIL: watchdog messaging data version: 1.1 2020-01-12 11:43:10: pid 16565:LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16565:LOG: watchdog cluster now holds the quorum 2020-01-12 11:43:10: pid 16565:DETAIL: updating the state of quarantine backend nodes 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16569:LOG: 2 watchdog nodes are configured for lifecheck 2020-01-12 11:43:10: pid 16569:LOG: watchdog nodes ID:0 Name:"pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" 2020-01-12 11:43:10: pid 16569:DETAIL: Host:"pgpool-poc01.novalocal" WD Port:9000 pgpool-II port:5433 2020-01-12 11:43:10: pid 16569:LOG: watchdog nodes ID:1 Name:"pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" 2020-01-12 11:43:10: pid 16569:DETAIL: Host:"pgpool-poc02.novalocal" WD Port:9000 pgpool-II port:5433 2020-01-12 11:43:10: pid 16565:LOG: we have joined the watchdog cluster as STANDBY node 2020-01-12 11:43:10: pid 16565:DETAIL: syncing the backend states from the MASTER watchdog node 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16567:LOG: received the get data request from local pgpool-II on IPC interface 2020-01-12 11:43:10: pid 16567:LOG: get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" 2020-01-12 11:43:10: pid 16567:DETAIL: waiting for the reply... 2020-01-12 11:43:10: pid 16569:LOG: watchdog lifecheck trusted server "mohvcasdb01.novalocal" added for the availability check 2020-01-12 11:43:10: pid 16569:LOG: watchdog lifecheck trusted server "mohcasdevdb.novalocal" added for the availability check 2020-01-12 11:43:10: pid 16565:LOG: master watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" returned status for 2 backend nodes 2020-01-12 11:43:10: pid 16565:LOG: Setting up socket for 0.0.0.0:5433 2020-01-12 11:43:10: pid 16565:LOG: Setting up socket for :::5433 2020-01-12 11:43:10: pid 16604:LOG: PCP process: 16604 started 2020-01-12 11:43:10: pid 16565:LOG: pgpool-II successfully started. version 4.1.0 (karasukiboshi) 2020-01-12 11:43:10: pid 16565:LOG: node status[0]: 0 2020-01-12 11:43:10: pid 16565:LOG: node status[1]: 0 2020-01-12 11:43:11: pid 16570:LOG: createing watchdog heartbeat receive socket. 2020-01-12 11:43:11: pid 16570:DETAIL: bind receive socket to device: "eth0" 2020-01-12 11:43:11: pid 16570:LOG: set SO_REUSEPORT option to the socket 2020-01-12 11:43:11: pid 16570:LOG: creating watchdog heartbeat receive socket. 2020-01-12 11:43:11: pid 16570:DETAIL: set SO_REUSEPORT 2020-01-12 11:43:11: pid 16571:LOG: creating socket for sending heartbeat 2020-01-12 11:43:11: pid 16571:DETAIL: bind send socket to device: eth0 2020-01-12 11:43:11: pid 16571:LOG: set SO_REUSEPORT option to the socket 2020-01-12 11:43:11: pid 16571:LOG: creating socket for sending heartbeat 2020-01-12 11:43:11: pid 16571:DETAIL: set SO_REUSEPORT 2020-01-12 11:55:14: pid 16604:LOG: forked new pcp worker, pid=17365 socket=7 2020-01-12 11:55:14: pid 16567:LOG: new IPC connection received 2020-01-12 11:55:14: pid 16604:LOG: PCP process with pid: 17365 exit with SUCCESS. 2020-01-12 11:55:14: pid 16604:LOG: PCP process with pid: 17365 exits with status 0 2020-01-12 11:56:43: pid 16567:LOG: remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" is shutting down 2020-01-12 11:56:43: pid 16567:LOG: watchdog cluster has lost the coordinator node 2020-01-12 11:56:43: pid 16567:LOG: removing the remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" from watchdog cluster master 2020-01-12 11:56:43: pid 16567:LOG: We have lost the cluster master node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" 2020-01-12 11:56:43: pid 16567:LOG: watchdog node state changed from [STANDBY] to [JOINING] 2020-01-12 11:56:47: pid 16567:LOG: watchdog node state changed from [JOINING] to [INITIALIZING] 2020-01-12 11:56:48: pid 16567:LOG: I am the only alive node in the watchdog cluster 2020-01-12 11:56:48: pid 16567:HINT: skipping stand for coordinator state 2020-01-12 11:56:48: pid 16567:LOG: watchdog node state changed from [INITIALIZING] to [MASTER] 2020-01-12 11:56:48: pid 16567:LOG: I am announcing my self as master/coordinator watchdog node 2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node 2020-01-12 11:56:52: pid 16567:DETAIL: our declare coordinator message is accepted by all nodes 2020-01-12 11:56:52: pid 16567:LOG: setting the local node "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" as watchdog cluster master 2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node but we do not have enough nodes in cluster 2020-01-12 11:56:52: pid 16567:DETAIL: waiting for the quorum to start escalation process 2020-01-12 11:56:52: pid 16567:LOG: new IPC connection received | ||||
| Tags | No tags attached. | ||||
|
|
You need to turn on enable_consensus_with_half_votes if you have only 2 watchdog nodes if you want that consensus on failover requires only half of the total number of votes (= 1 node) in Pgpool-II 4.1 or later. ( we recommend to have 3 and more even number of nodes). |
|
|
Turning on the said parameter, enable_consensus_with_half_votes worked !!!!!!! and two node cluster pgpool fail-over is started happening. Thanks a lot for quick resolution. You Rock !!!. |
|
|
Glad to hear that. I am going to mark this issue as "resolved". |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2020-01-12 18:32 | raj.pandey1982@gmail.com | New Issue | |
| 2020-01-12 20:09 | t-ishii | Note Added: 0003059 | |
| 2020-01-12 20:10 | t-ishii | Assigned To | => t-ishii |
| 2020-01-12 20:10 | t-ishii | Status | new => feedback |
| 2020-01-12 20:48 | raj.pandey1982@gmail.com | Note Added: 0003060 | |
| 2020-01-12 20:48 | raj.pandey1982@gmail.com | Status | feedback => assigned |
| 2020-01-13 08:22 | t-ishii | Note Added: 0003061 | |
| 2020-01-13 08:22 | t-ishii | Status | assigned => resolved |