View Issue Details

IDProjectCategoryView StatusLast Update
0000574Pgpool-IIBugpublic2020-01-13 08:22
Reporterraj.pandey1982@gmail.comAssigned Tot-ishii 
PriorityhighSeveritymajorReproducibilityalways
Status resolvedResolutionopen 
PlatformLinuxOScentosOS Versionx86_64 x86_64 x8
Product Version4.1.0 
Target VersionFixed in Version 
Summary0000574: waiting for the quorum to start escalation process
DescriptionDB: Postgres:11.5
PGPOOL II 4.1.0

I have 2 postgres master-slave nodes each having pgpoolI 4.1.0 configured as Master and stand by as well and 1 virtual IP.
when pgpool on node 1 or node 2 is made down the VIP get released but not getting acquired by other one.

If node 1 made down and VIP gor released then Node 2 Log says:-

2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node but we do not have enough nodes in cluster
2020-01-12 11:56:52: pid 16567:DETAIL: waiting for the quorum to start escalation process

Now when i start pgpool at node 1 again then only Node2 acquires the VIP. so main issue is other node not holding the QUORUM until unless i again start
the previously stopped node as stand by.

The old suggestion (#systemctl stop firewalld) i tried but did not help.

Also i have an other instance of 4.0.1 with same kind of 2 nodes setup but IP release and acquiring is going thrugh well.

Below is full log of node2 :

2020-01-12 11:43:09: pid 16565:LOG: waiting for watchdog to initialize
2020-01-12 11:43:09: pid 16567:LOG: setting the local watchdog node name to "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal"
2020-01-12 11:43:09: pid 16567:LOG: watchdog cluster is configured with 1 remote nodes
2020-01-12 11:43:09: pid 16567:LOG: watchdog remote node:0 on pgpool-poc02.novalocal:9000
2020-01-12 11:43:09: pid 16567:LOG: interface monitoring is disabled in watchdog
2020-01-12 11:43:09: pid 16567:LOG: watchdog node state changed from [DEAD] to [LOADING]
2020-01-12 11:43:09: pid 16567:LOG: new outbound connection to pgpool-poc02.novalocal:9000
2020-01-12 11:43:09: pid 16567:LOG: setting the remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" as watchdog cluster master
2020-01-12 11:43:09: pid 16567:LOG: watchdog node state changed from [LOADING] to [INITIALIZING]
2020-01-12 11:43:09: pid 16567:LOG: new watchdog node connection is received from "10.70.184.28:25794"
2020-01-12 11:43:09: pid 16567:LOG: new node joined the cluster hostname:"pgpool-poc02.novalocal" port:9000 pgpool_port:5433
2020-01-12 11:43:09: pid 16567:DETAIL: Pgpool-II version:"4.1.0" watchdog messaging version: 1.1
2020-01-12 11:43:10: pid 16567:LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
2020-01-12 11:43:10: pid 16567:LOG: successfully joined the watchdog cluster as standby node
2020-01-12 11:43:10: pid 16567:DETAIL: our join coordinator request is accepted by cluster leader node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-12 11:43:10: pid 16565:LOG: watchdog process is initialized
2020-01-12 11:43:10: pid 16565:DETAIL: watchdog messaging data version: 1.1
2020-01-12 11:43:10: pid 16565:LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16565:LOG: watchdog cluster now holds the quorum
2020-01-12 11:43:10: pid 16565:DETAIL: updating the state of quarantine backend nodes
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16569:LOG: 2 watchdog nodes are configured for lifecheck
2020-01-12 11:43:10: pid 16569:LOG: watchdog nodes ID:0 Name:"pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal"
2020-01-12 11:43:10: pid 16569:DETAIL: Host:"pgpool-poc01.novalocal" WD Port:9000 pgpool-II port:5433
2020-01-12 11:43:10: pid 16569:LOG: watchdog nodes ID:1 Name:"pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-12 11:43:10: pid 16569:DETAIL: Host:"pgpool-poc02.novalocal" WD Port:9000 pgpool-II port:5433
2020-01-12 11:43:10: pid 16565:LOG: we have joined the watchdog cluster as STANDBY node
2020-01-12 11:43:10: pid 16565:DETAIL: syncing the backend states from the MASTER watchdog node
2020-01-12 11:43:10: pid 16567:LOG: new IPC connection received
2020-01-12 11:43:10: pid 16567:LOG: received the get data request from local pgpool-II on IPC interface
2020-01-12 11:43:10: pid 16567:LOG: get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-12 11:43:10: pid 16567:DETAIL: waiting for the reply...
2020-01-12 11:43:10: pid 16569:LOG: watchdog lifecheck trusted server "mohvcasdb01.novalocal" added for the availability check
2020-01-12 11:43:10: pid 16569:LOG: watchdog lifecheck trusted server "mohcasdevdb.novalocal" added for the availability check
2020-01-12 11:43:10: pid 16565:LOG: master watchdog node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" returned status for 2 backend nodes
2020-01-12 11:43:10: pid 16565:LOG: Setting up socket for 0.0.0.0:5433
2020-01-12 11:43:10: pid 16565:LOG: Setting up socket for :::5433
2020-01-12 11:43:10: pid 16604:LOG: PCP process: 16604 started
2020-01-12 11:43:10: pid 16565:LOG: pgpool-II successfully started. version 4.1.0 (karasukiboshi)
2020-01-12 11:43:10: pid 16565:LOG: node status[0]: 0
2020-01-12 11:43:10: pid 16565:LOG: node status[1]: 0
2020-01-12 11:43:11: pid 16570:LOG: createing watchdog heartbeat receive socket.
2020-01-12 11:43:11: pid 16570:DETAIL: bind receive socket to device: "eth0"
2020-01-12 11:43:11: pid 16570:LOG: set SO_REUSEPORT option to the socket
2020-01-12 11:43:11: pid 16570:LOG: creating watchdog heartbeat receive socket.
2020-01-12 11:43:11: pid 16570:DETAIL: set SO_REUSEPORT
2020-01-12 11:43:11: pid 16571:LOG: creating socket for sending heartbeat
2020-01-12 11:43:11: pid 16571:DETAIL: bind send socket to device: eth0
2020-01-12 11:43:11: pid 16571:LOG: set SO_REUSEPORT option to the socket
2020-01-12 11:43:11: pid 16571:LOG: creating socket for sending heartbeat
2020-01-12 11:43:11: pid 16571:DETAIL: set SO_REUSEPORT
2020-01-12 11:55:14: pid 16604:LOG: forked new pcp worker, pid=17365 socket=7
2020-01-12 11:55:14: pid 16567:LOG: new IPC connection received
2020-01-12 11:55:14: pid 16604:LOG: PCP process with pid: 17365 exit with SUCCESS.
2020-01-12 11:55:14: pid 16604:LOG: PCP process with pid: 17365 exits with status 0
2020-01-12 11:56:43: pid 16567:LOG: remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" is shutting down
2020-01-12 11:56:43: pid 16567:LOG: watchdog cluster has lost the coordinator node
2020-01-12 11:56:43: pid 16567:LOG: removing the remote node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal" from watchdog cluster master
2020-01-12 11:56:43: pid 16567:LOG: We have lost the cluster master node "pgpool-poc02.novalocal:5433 Linux pgpool-poc02.novalocal"
2020-01-12 11:56:43: pid 16567:LOG: watchdog node state changed from [STANDBY] to [JOINING]
2020-01-12 11:56:47: pid 16567:LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2020-01-12 11:56:48: pid 16567:LOG: I am the only alive node in the watchdog cluster
2020-01-12 11:56:48: pid 16567:HINT: skipping stand for coordinator state
2020-01-12 11:56:48: pid 16567:LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2020-01-12 11:56:48: pid 16567:LOG: I am announcing my self as master/coordinator watchdog node
2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node
2020-01-12 11:56:52: pid 16567:DETAIL: our declare coordinator message is accepted by all nodes
2020-01-12 11:56:52: pid 16567:LOG: setting the local node "pgpool-poc01.novalocal:5433 Linux pgpool-poc01.novalocal" as watchdog cluster master
2020-01-12 11:56:52: pid 16567:LOG: I am the cluster leader node but we do not have enough nodes in cluster
2020-01-12 11:56:52: pid 16567:DETAIL: waiting for the quorum to start escalation process
2020-01-12 11:56:52: pid 16567:LOG: new IPC connection received
TagsNo tags attached.

Activities

t-ishii

2020-01-12 20:09

developer   ~0003059

You need to turn on enable_consensus_with_half_votes if you have only 2 watchdog nodes if you want that consensus on failover requires only half of the total number of votes (= 1 node) in Pgpool-II 4.1 or later. ( we recommend to have 3 and more even number of nodes).

raj.pandey1982@gmail.com

2020-01-12 20:48

reporter   ~0003060

Turning on the said parameter, enable_consensus_with_half_votes worked !!!!!!! and two node cluster pgpool fail-over is started happening. Thanks a lot for quick resolution. You Rock !!!.

t-ishii

2020-01-13 08:22

developer   ~0003061

Glad to hear that. I am going to mark this issue as "resolved".

Issue History

Date Modified Username Field Change
2020-01-12 18:32 raj.pandey1982@gmail.com New Issue
2020-01-12 20:09 t-ishii Note Added: 0003059
2020-01-12 20:10 t-ishii Assigned To => t-ishii
2020-01-12 20:10 t-ishii Status new => feedback
2020-01-12 20:48 raj.pandey1982@gmail.com Note Added: 0003060
2020-01-12 20:48 raj.pandey1982@gmail.com Status feedback => assigned
2020-01-13 08:22 t-ishii Note Added: 0003061
2020-01-13 08:22 t-ishii Status assigned => resolved