0000574: waiting for the quorum to start escalation process - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000574	Pgpool-II	Bug	public	2020-01-12 18:32	2020-01-13 08:22

Reporter	raj.pandey1982@gmail.com	Assigned To	t-ishii
Priority	high	Severity	major	Reproducibility	always
Status	resolved	Resolution	open
Platform	Linux	OS	centos	OS Version	x86_64 x86_64 x8
Product Version	4.1.0

Summary	0000574: waiting for the quorum to start escalation process
Description	DB: Postgres:11.5 PGPOOL II 4.1.0 I have 2 postgres master-slave nodes each having pgpoolI 4.1.0 configured as Master and stand by as well and 1 virtual IP. when pgpool on node 1 or node 2 is made down the VIP get released but not getting acquired by other one. If node 1 made down and VIP gor released then Node 2 Log says:- 2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node but we do not have enough nodes in cluster 2020-01-12 11:56:52: pid 16567:DETAIL: waiting for the quorum to start escalation process Now when i start pgpool at node 1 again then only Node2 acquires the VIP. so main issue is other node not holding the QUORUM until unless i again start the previously stopped node as stand by. The old suggestion (#systemctl stop firewalld) i tried but did not help. Also i have an other instance of 4.0.1 with same kind of 2 nodes setup but IP release and acquiring is going thrugh well. Below is full log of node2 : 2020-01-12 11:43:09: pid 16565:LOG: waiting for watchdog to initialize 2020-01-12 11:43:09: pid 16567:LOG: setting the local watchdog node name to "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" 2020-01-12 11:43:09: pid 16567:LOG: watchdog cluster is configured with 1 remote nodes 2020-01-12 11:43:09: pid 16567:LOG: watchdog remote node:0 on pgpool-poc02.novalocal:9000 2020-01-12 11:43:09: pid 16567:LOG: interface monitoring is disabled in watchdog 2020-01-12 11:43:09: pid 16567:LOG: watchdog node state changed from [DEAD] to [LOADING] 2020-01-12 11:43:09: pid 16567:LOG: new outbound connection to pgpool-poc02.novalocal:9000 2020-01-12 11:43:09: pid 16567:LOG: setting the remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" as watchdog cluster master 2020-01-12 11:43:09: pid 16567:LOG: watchdog node state changed from [LOADING] to [INITIALIZING] 2020-01-12 11:43:09: pid 16567:LOG: new watchdog node connection is received from "10.70.184.28:25794" 2020-01-12 11:43:09: pid 16567:LOG: new node joined the cluster hostname:"pgpool-poc02.novalocal" port:9000 pgpool_port:5433 2020-01-12 11:43:09: pid 16567:DETAIL: Pgpool-II version:"4.1.0" watchdog messaging version: 1.1 2020-01-12 11:43:10: pid 16567:LOG: watchdog node state changed from [INITIALIZING] to [STANDBY] 2020-01-12 11:43:10: pid 16567:LOG: successfully joined the watchdog cluster as standby node 2020-01-12 11:43:10: pid 16567:DETAIL: our join coordinator request is accepted by cluster leader node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" 2020-01-12 11:43:10: pid 16565:LOG: watchdog process is initialized 2020-01-12 11:43:10: pid 16565:DETAIL: watchdog messaging data version: 1.1 2020-01-12 11:43:10: pid 16565:LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16565:LOG: watchdog cluster now holds the quorum 2020-01-12 11:43:10: pid 16565:DETAIL: updating the state of quarantine backend nodes 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16569:LOG: 2 watchdog nodes are configured for lifecheck 2020-01-12 11:43:10: pid 16569:LOG: watchdog nodes ID:0 Name:"pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" 2020-01-12 11:43:10: pid 16569:DETAIL: Host:"pgpool-poc01.novalocal" WD Port:9000 pgpool-II port:5433 2020-01-12 11:43:10: pid 16569:LOG: watchdog nodes ID:1 Name:"pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" 2020-01-12 11:43:10: pid 16569:DETAIL: Host:"pgpool-poc02.novalocal" WD Port:9000 pgpool-II port:5433 2020-01-12 11:43:10: pid 16565:LOG: we have joined the watchdog cluster as STANDBY node 2020-01-12 11:43:10: pid 16565:DETAIL: syncing the backend states from the MASTER watchdog node 2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received 2020-01-12 11:43:10: pid 16567:LOG: received the get data request from local pgpool-II on IPC interface 2020-01-12 11:43:10: pid 16567:LOG: get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" 2020-01-12 11:43:10: pid 16567:DETAIL: waiting for the reply... 2020-01-12 11:43:10: pid 16569:LOG: watchdog lifecheck trusted server "mohvcasdb01.novalocal" added for the availability check 2020-01-12 11:43:10: pid 16569:LOG: watchdog lifecheck trusted server "mohcasdevdb.novalocal" added for the availability check 2020-01-12 11:43:10: pid 16565:LOG: master watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" returned status for 2 backend nodes 2020-01-12 11:43:10: pid 16565:LOG: Setting up socket for 0.0.0.0:5433 2020-01-12 11:43:10: pid 16565:LOG: Setting up socket for :::5433 2020-01-12 11:43:10: pid 16604:LOG: PCP process: 16604 started 2020-01-12 11:43:10: pid 16565:LOG: pgpool-II successfully started. version 4.1.0 (karasukiboshi) 2020-01-12 11:43:10: pid 16565:LOG: node status[0]: 0 2020-01-12 11:43:10: pid 16565:LOG: node status[1]: 0 2020-01-12 11:43:11: pid 16570:LOG: createing watchdog heartbeat receive socket. 2020-01-12 11:43:11: pid 16570:DETAIL: bind receive socket to device: "eth0" 2020-01-12 11:43:11: pid 16570:LOG: set SO_REUSEPORT option to the socket 2020-01-12 11:43:11: pid 16570:LOG: creating watchdog heartbeat receive socket. 2020-01-12 11:43:11: pid 16570:DETAIL: set SO_REUSEPORT 2020-01-12 11:43:11: pid 16571:LOG: creating socket for sending heartbeat 2020-01-12 11:43:11: pid 16571:DETAIL: bind send socket to device: eth0 2020-01-12 11:43:11: pid 16571:LOG: set SO_REUSEPORT option to the socket 2020-01-12 11:43:11: pid 16571:LOG: creating socket for sending heartbeat 2020-01-12 11:43:11: pid 16571:DETAIL: set SO_REUSEPORT 2020-01-12 11:55:14: pid 16604:LOG: forked new pcp worker, pid=17365 socket=7 2020-01-12 11:55:14: pid 16567:LOG: new IPC connection received 2020-01-12 11:55:14: pid 16604:LOG: PCP process with pid: 17365 exit with SUCCESS. 2020-01-12 11:55:14: pid 16604:LOG: PCP process with pid: 17365 exits with status 0 2020-01-12 11:56:43: pid 16567:LOG: remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" is shutting down 2020-01-12 11:56:43: pid 16567:LOG: watchdog cluster has lost the coordinator node 2020-01-12 11:56:43: pid 16567:LOG: removing the remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" from watchdog cluster master 2020-01-12 11:56:43: pid 16567:LOG: We have lost the cluster master node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" 2020-01-12 11:56:43: pid 16567:LOG: watchdog node state changed from [STANDBY] to [JOINING] 2020-01-12 11:56:47: pid 16567:LOG: watchdog node state changed from [JOINING] to [INITIALIZING] 2020-01-12 11:56:48: pid 16567:LOG: I am the only alive node in the watchdog cluster 2020-01-12 11:56:48: pid 16567:HINT: skipping stand for coordinator state 2020-01-12 11:56:48: pid 16567:LOG: watchdog node state changed from [INITIALIZING] to [MASTER] 2020-01-12 11:56:48: pid 16567:LOG: I am announcing my self as master/coordinator watchdog node 2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node 2020-01-12 11:56:52: pid 16567:DETAIL: our declare coordinator message is accepted by all nodes 2020-01-12 11:56:52: pid 16567:LOG: setting the local node "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" as watchdog cluster master 2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node but we do not have enough nodes in cluster 2020-01-12 11:56:52: pid 16567:DETAIL: waiting for the quorum to start escalation process 2020-01-12 11:56:52: pid 16567:LOG: new IPC connection received
Tags	No tags attached.

t-ishii 2020-01-12 20:09 developer ~0003059	You need to turn on enable_consensus_with_half_votes if you have only 2 watchdog nodes if you want that consensus on failover requires only half of the total number of votes (= 1 node) in Pgpool-II 4.1 or later. ( we recommend to have 3 and more even number of nodes).

raj.pandey1982@gmail.com 2020-01-12 20:48 reporter ~0003060	Turning on the said parameter, enable_consensus_with_half_votes worked !!!!!!! and two node cluster pgpool fail-over is started happening. Thanks a lot for quick resolution. You Rock !!!.

t-ishii 2020-01-13 08:22 developer ~0003061	Glad to hear that. I am going to mark this issue as "resolved".

Date Modified	Username	Field	Change
2020-01-12 18:32	raj.pandey1982@gmail.com	New Issue
2020-01-12 20:09	t-ishii	Note Added: 0003059
2020-01-12 20:10	t-ishii	Assigned To	=> t-ishii
2020-01-12 20:10	t-ishii	Status	new => feedback
2020-01-12 20:48	raj.pandey1982@gmail.com	Note Added: 0003060
2020-01-12 20:48	raj.pandey1982@gmail.com	Status	feedback => assigned
2020-01-13 08:22	t-ishii	Note Added: 0003061
2020-01-13 08:22	t-ishii	Status	assigned => resolved