View Issue Details

IDProjectCategoryView StatusLast Update
0000541Pgpool-IIBugpublic2019-09-06 07:18
ReporterCarlos MendezAssigned Tot-ishii 
PriorityhighSeverityfeatureReproducibilityalways
Status resolvedResolutionopen 
Product Version3.7.1 
Target VersionFixed in Version 
Summary0000541: Unable to start Virtual IP for Pgpool
DescriptionEnvironment Details:

pgpool_01 IP: 10.241.166.21
VIP: 10.241.166.24

PRIMARY DB IP: 10.241.166.32
STANDBY DB IP: 10.241.166.33

***********************************************************************************
After to start pgpool services the Virtual IP is not available

# /usr/share/pgpool-II/3.7.1/bin/pgpool -D -n > /var/log/pgpool/pgpool.log 2>&1 &

***********************************************************************************
In the log file the last message is:


2019-08-30 12:16:16: pid 32201: WARNING: checking setuid bit of if_up_cmd
2019-08-30 12:16:16: pid 32201: DETAIL: ifup[/usr/sbin/ip] doesn't have setuid bit
2019-08-30 12:16:16: pid 32201: WARNING: checking setuid bit of if_down_cmd
2019-08-30 12:16:16: pid 32201: DETAIL: ifdown[/usr/sbin/ip] doesn't have setuid bit
2019-08-30 12:16:16: pid 32201: WARNING: checking setuid bit of arping command
2019-08-30 12:16:16: pid 32201: DETAIL: arping[/usr/sbin/arping] doesn't have setuid bit
2019-08-30 12:16:16: pid 32201: LOG: Backend status file /var/log/pgpool/pgpool_status discarded
2019-08-30 12:16:16: pid 32201: LOG: waiting for watchdog to initialize
2019-08-30 12:16:16: pid 32202: LOG: setting the local watchdog node name to "10.241.166.21:9999 Linux pgpool_01"
2019-08-30 12:16:16: pid 32202: LOG: watchdog cluster is configured with 2 remote nodes
2019-08-30 12:16:16: pid 32202: LOG: watchdog remote node:0 on pgpool_02:9000
2019-08-30 12:16:16: pid 32202: LOG: watchdog remote node:1 on pgpool_03:9000
2019-08-30 12:16:16: pid 32202: LOG: interface monitoring is disabled in watchdog
2019-08-30 12:16:16: pid 32202: LOG: watchdog node state changed from [DEAD] to [LOADING]
2019-08-30 12:16:21: pid 32202: LOG: watchdog node state changed from [LOADING] to [JOINING]
2019-08-30 12:16:25: pid 32202: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2019-08-30 12:16:26: pid 32202: LOG: I am the only alive node in the watchdog cluster
2019-08-30 12:16:26: pid 32202: HINT: skiping stand for coordinator state
2019-08-30 12:16:26: pid 32202: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2019-08-30 12:16:26: pid 32202: LOG: I am announcing my self as master/coordinator watchdog node
2019-08-30 12:16:30: pid 32202: LOG: I am the cluster leader node
2019-08-30 12:16:30: pid 32202: DETAIL: our declare coordinator message is accepted by all nodes
2019-08-30 12:16:30: pid 32202: LOG: setting the local node "10.241.166.21:9999 Linux pgpool_01" as watchdog cluster master
2019-08-30 12:16:30: pid 32202: LOG: I am the cluster leader node but we do not have enough nodes in cluster
2019-08-30 12:16:30: pid 32202: DETAIL: waiting for the quorum to start escalation process
2019-08-30 12:16:30: pid 32201: LOG: watchdog process is initialized
2019-08-30 12:16:30: pid 32202: LOG: new IPC connection received
2019-08-30 12:16:30: pid 32202: LOG: new IPC connection received
2019-08-30 12:16:30: pid 32217: LOG: 3 watchdog nodes are configured for lifecheck
2019-08-30 12:16:30: pid 32217: LOG: watchdog nodes ID:0 Name:"10.241.166.21:9999 Linux pgpool_01"
2019-08-30 12:16:30: pid 32217: DETAIL: Host:"10.241.166.21" WD Port:9000 pgpool-II port:9999
2019-08-30 12:16:30: pid 32217: LOG: watchdog nodes ID:1 Name:"Not_Set"
2019-08-30 12:16:30: pid 32217: DETAIL: Host:"pgpool_02" WD Port:9000 pgpool-II port:9999
2019-08-30 12:16:30: pid 32217: LOG: watchdog nodes ID:2 Name:"Not_Set"
2019-08-30 12:16:30: pid 32217: DETAIL: Host:"pgpool_03" WD Port:9000 pgpool-II port:9999
2019-08-30 12:16:30: pid 32201: LOG: Setting up socket for 0.0.0.0:9999
2019-08-30 12:16:30: pid 32201: LOG: Setting up socket for :::9999
2019-08-30 12:16:30: pid 32217: LOG: watchdog lifecheck trusted server "10.241.166.21" added for the availability check
2019-08-30 12:16:30: pid 32217: LOG: watchdog lifecheck trusted server "10.241.166.22" added for the availability check
2019-08-30 12:16:30: pid 32217: LOG: watchdog lifecheck trusted server "10.241.166.23" added for the availability check
2019-08-30 12:16:30: pid 32201: LOG: find_primary_node_repeatedly: waiting for finding a primary node
2019-08-30 12:16:30: pid 32201: LOG: find_primary_node: checking backend no 0
2019-08-30 12:16:30: pid 32201: LOG: find_primary_node: primary node id is 0
2019-08-30 12:16:30: pid 32201: LOG: pgpool-II successfully started. version 3.7.1 (amefuriboshi)
2019-08-30 12:16:31: pid 32218: LOG: set SO_REUSEPORT option to the socket
2019-08-30 12:16:31: pid 32218: LOG: creating watchdog heartbeat receive socket.
2019-08-30 12:16:31: pid 32218: DETAIL: set SO_REUSEPORT
2019-08-30 12:16:31: pid 32219: LOG: set SO_REUSEPORT option to the socket
2019-08-30 12:16:31: pid 32219: LOG: creating socket for sending heartbeat
2019-08-30 12:16:31: pid 32219: DETAIL: set SO_REUSEPORT
2019-08-30 12:16:31: pid 32221: LOG: set SO_REUSEPORT option to the socket
2019-08-30 12:16:31: pid 32221: LOG: creating watchdog heartbeat receive socket.
2019-08-30 12:16:31: pid 32221: DETAIL: set SO_REUSEPORT
2019-08-30 12:16:31: pid 32223: LOG: set SO_REUSEPORT option to the socket
2019-08-30 12:16:31: pid 32223: LOG: creating socket for sending heartbeat
2019-08-30 12:16:31: pid 32223: DETAIL: set SO_REUSEPORT

******************************************************************************************

The following commands for assign the VIP are working without any issues if these are executed directly

if_up_cmd = 'ip addr add 10.241.166.24/25 dev eth0 label eth0:0'
if_down_cmd = 'ip addr del 10.241.166.24/25 dev eth0'


********************************************************************************************

According to the log I think the issue can be with the following command SO_REUSEPORT but not sure about this
please if you have faced this kind of error
Steps To Reproducestarting pgpool services

# /usr/share/pgpool-II/3.7.1/bin/pgpool -D -n > /var/log/pgpool/pgpool.log 2>&1 &
Tagspgpool 3.7.1, settings

Activities

t-ishii

2019-09-02 09:28

developer   ~0002807

I noticed two things:

1) Pgpool-II 3.7.1 is way old. I strongly recommend to upgrade to the newest one (3.7.11 at this point).

2) Probably VIP was not set because there's only 1 live watchdog node. I am not sure if this is an expected behavior of watchdog, though.

Carlos Mendez

2019-09-02 12:39

reporter   ~0002808

Hi T-Ishii
Thanks by your comments
1. I will check with the operation for go to newest version

2. For the second point according to your comments then it's necessary to have more pgpools?

 Our configuration include 3 pgpools but at this moment we are only trying to start just one

Tomorrow during my day I will try to start all the 3 pgpools and Check about this anyway if you have some other idea or comments about the issue please let me know


Regards

t-ishii

2019-09-02 13:07

developer   ~0002809

> 2. For the second point according to your comments then it's necessary to have more pgpools?
>
> Our configuration include 3 pgpools but at this moment we are only trying to start just one
Yes, if you have 3 pgpools, then you should start at least two of them to let "quorum" exist.

Carlos Mendez

2019-09-03 01:17

reporter   ~0002811

T-Ishii

According to your last update, after to start the services for the second pgpool the VIP was assigned to the first server this is something that I did not know thanks by your support

In the log for the pgpool 1 I was able to see the following messages:


2019-09-02 06:01:40: pid 32202: LOG: new watchdog node connection is received from "10.241.166.22:60088"
2019-09-02 06:01:40: pid 32202: LOG: new node joined the cluster hostname:"pgpool_02" port:9000 pgpool_port:9999
2019-09-02 06:01:41: pid 32202: LOG: adding watchdog node "pgpool_02:9999 Linux pgpool_02" to the standby list
2019-09-02 06:01:41: pid 32202: LOG: quorum found
2019-09-02 06:01:41: pid 32202: DETAIL: starting escalation process
2019-09-02 06:01:41: pid 32202: LOG: escalation process started with PID:6889
2019-09-02 06:01:41: pid 6889: LOG: watchdog: escalation started
2019-09-02 06:01:41: pid 32201: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2019-09-02 06:01:41: pid 32202: LOG: new IPC connection received
2019-09-02 06:01:41: pid 32201: LOG: watchdog cluster now holds the quorum
2019-09-02 06:01:41: pid 32201: DETAIL: updating the state of quarantine backend nodes
2019-09-02 06:01:41: pid 32202: LOG: new IPC connection received
2019-09-02 06:01:45: pid 6889: LOG: successfully acquired the delegate IP:"10.241.166.24"
2019-09-02 06:01:45: pid 6889: DETAIL: 'if_up_cmd' returned with success
2019-09-02 06:01:45: pid 32202: LOG: watchdog escalation process with pid: 6889 exit with SUCCESS.
2019-09-02 06:04:18: pid 32286: LOG: new connection received
2019-09-02 06:04:18: pid 32286: DETAIL: connecting host=10.241.156.104 port=64471
2019-09-02 06:04:23: pid 32308: LOG: new connection received
2019-09-02 06:04:23: pid 32308: DETAIL: connecting host=10.241.156.104 port=64474
2019-09-02 06:11:58: pid 32202: LOG: new watchdog node connection is received from "10.241.166.23:38023"
2019-09-02 06:11:58: pid 32202: LOG: new node joined the cluster hostname:"pgpool_03" port:9000 pgpool_port:9999
2019-09-02 06:11:59: pid 32202: LOG: adding watchdog node "pgpool_03:9999 Linux pgpool_03" to the standby list
2019-09-02 06:13:10: pid 32217: LOG: watchdog: lifecheck started


Regards
CAR

t-ishii

2019-09-03 09:30

developer   ~0002812

Thanks for the report. Now I wonder if you shut down pgpool1 (i.e. pgpool0 is the only live node), then VIP is released or not.
I know that if you start the whole cluster with only 1 node, VIP is not assigned but this is another scenario.

Can you please test it out?

Carlos Mendez

2019-09-04 02:11

reporter   ~0002814

Hi T-ishii

At this moment we were able to start the VIP 10.241.166.24 thi is our configuration now:
pgpool_01 10.241.166.21
pgpool_02 10.241.166.22
pgpool_03 10.241.166.23
Primary DB 10.241.166.32
Standby DB 10.241.166. 33

If I start PGPOOL01 and some other PGPOOL (PGPOOL2 or PGPOOL3 ) the VIP is started on PGPOOL 1
But if I stop the services for pgpool1 the VIP is released but is not started in the remains pgpools (pgpool2 or pgpool3)
and the service to the Primary DB is affected

on the Logs we can see this:

STARTING SERVICES

PGPOOL1 LOG

2019-09-03 12:04:50: pid 16453: LOG: set SO_REUSEPORT option to the socket
2019-09-03 12:04:50: pid 16453: LOG: creating watchdog heartbeat receive socket.
2019-09-03 12:04:50: pid 16453: DETAIL: set SO_REUSEPORT
2019-09-03 12:04:56: pid 16438: LOG: new watchdog node connection is received from "10.241.166.22:28397"
2019-09-03 12:04:56: pid 16438: LOG: new node joined the cluster hostname:"pgpool_02" port:9000 pgpool_port:9999
2019-09-03 12:04:57: pid 16438: LOG: adding watchdog node "pgpool_02:9999 Linux pgpool_02" to the standby list
2019-09-03 12:04:57: pid 16438: LOG: quorum found
2019-09-03 12:04:57: pid 16438: DETAIL: starting escalation process
2019-09-03 12:04:57: pid 16438: LOG: escalation process started with PID:16568
2019-09-03 12:04:57: pid 16568: LOG: watchdog: escalation started
2019-09-03 12:04:57: pid 16437: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2019-09-03 12:04:57: pid 16438: LOG: new IPC connection received
2019-09-03 12:04:57: pid 16437: LOG: watchdog cluster now holds the quorum
2019-09-03 12:04:57: pid 16437: DETAIL: updating the state of quarantine backend nodes
2019-09-03 12:04:57: pid 16438: LOG: new IPC connection received
2019-09-03 12:05:01: pid 16568: LOG: successfully acquired the delegate IP:"10.241.166.24"
2019-09-03 12:05:01: pid 16568: DETAIL: 'if_up_cmd' returned with success
2019-09-03 12:05:01: pid 16438: LOG: watchdog escalation process with pid: 16568 exit with SUCCESS.
2019-09-03 12:05:56: pid 16438: LOG: new watchdog node connection is received from "10.241.166.23:51383"
2019-09-03 12:05:56: pid 16438: LOG: new node joined the cluster hostname:"pgpool_03" port:9000 pgpool_port:9999
2019-09-03 12:05:57: pid 16438: LOG: adding watchdog node "pgpool_03:9999 Linux pgpool_03" to the standby list
2019-09-03 12:06:29: pid 16452: LOG: watchdog: lifecheck started


*************************************************************************************************************
STOPPING SERVICES PGPOOL1

PGPOOL1 LOG:

2019-09-03 12:07:37: pid 16438: LOG: Watchdog is shutting down
2019-09-03 12:07:37: pid 16720: LOG: watchdog: de-escalation started
2019-09-03 12:07:37: pid 16720: LOG: successfully released the delegate IP:"10.241.166.24"
2019-09-03 12:07:37: pid 16720: DETAIL: 'if_down_cmd' returned with success

PGPOOL 2 LOG

2019-09-03 12:07:37: pid 20356: LOG: remote node "pgpool_01:9999 Linux pgpool_01" is shutting down
2019-09-03 12:07:37: pid 20356: LOG: watchdog cluster has lost the coordinator node
2019-09-03 12:07:37: pid 20356: LOG: unassigning the remote node "pgpool_01:9999 Linux pgpool_01" from watchdog cluster master
2019-09-03 12:07:37: pid 20356: LOG: We have lost the cluster master node "pgpool_01:9999 Linux pgpool_01"
2019-09-03 12:07:37: pid 20356: LOG: watchdog node state changed from [STANDBY] to [JOINING]
2019-09-03 12:07:41: pid 20356: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2019-09-03 12:07:42: pid 20356: LOG: I am the only alive node in the watchdog cluster
2019-09-03 12:07:42: pid 20356: HINT: skiping stand for coordinator state
2019-09-03 12:07:42: pid 20356: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2019-09-03 12:07:42: pid 20356: LOG: I am announcing my self as master/coordinator watchdog node
2019-09-03 12:07:46: pid 20356: LOG: I am the cluster leader node
2019-09-03 12:07:46: pid 20356: DETAIL: our declare coordinator message is accepted by all nodes
2019-09-03 12:07:46: pid 20356: LOG: setting the local node "pgpool_02:9999 Linux pgpool_02" as watchdog cluster master
2019-09-03 12:07:46: pid 20356: LOG: I am the cluster leader node but we do not have enough nodes in cluster
2019-09-03 12:07:46: pid 20356: DETAIL: waiting for the quorum to start escalation process
2019-09-03 12:07:46: pid 20355: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2019-09-03 12:07:46: pid 20356: LOG: new IPC connection received
2019-09-03 12:07:46: pid 20356: LOG: new IPC connection received


PGPOOL3 LOG

2019-09-03 12:07:37: pid 20870: LOG: remote node "pgpool_01:9999 Linux pgpool_01" is shutting down
2019-09-03 12:07:37: pid 20870: LOG: watchdog cluster has lost the coordinator node
2019-09-03 12:07:37: pid 20870: LOG: unassigning the remote node "pgpool_01:9999 Linux pgpool_01" from watchdog cluster master
2019-09-03 12:07:37: pid 20870: LOG: We have lost the cluster master node "pgpool_01:9999 Linux pgpool_01"
2019-09-03 12:07:37: pid 20870: LOG: watchdog node state changed from [STANDBY] to [JOINING]
2019-09-03 12:07:41: pid 20870: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2019-09-03 12:07:42: pid 20870: LOG: I am the only alive node in the watchdog cluster
2019-09-03 12:07:42: pid 20870: HINT: skiping stand for coordinator state
2019-09-03 12:07:42: pid 20870: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2019-09-03 12:07:42: pid 20870: LOG: I am announcing my self as master/coordinator watchdog node
2019-09-03 12:07:46: pid 20870: LOG: I am the cluster leader node
2019-09-03 12:07:46: pid 20870: DETAIL: our declare coordinator message is accepted by all nodes
2019-09-03 12:07:46: pid 20870: LOG: setting the local node "pgpool_03:9999 Linux pgpool_03" as watchdog cluster master
2019-09-03 12:07:46: pid 20870: LOG: I am the cluster leader node but we do not have enough nodes in cluster
2019-09-03 12:07:46: pid 20870: DETAIL: waiting for the quorum to start escalation process
2019-09-03 12:07:46: pid 20870: LOG: new IPC connection received



As we can see the VIP is released by pgpool1 but is not started in any other PGPOOL
Any comments will help me

Regards

t-ishii

2019-09-04 07:28

developer   ~0002815

I think it's an expected behavior. If there's only one node out of 3, then there's no quorum, thus pgpool won't bring up VIP. You need to keep at least 2 nodes to bring up VIP.

t-ishii

2019-09-04 10:41

developer   ~0002816

Last edited: 2019-09-04 11:57

View 2 revisions

Wait. If pgpool2 and pgpool3 are alive, VIP should be brought up.
Can you show the result of following commands:

pcp_watchdog_info -v -p pcp_port -h 10.241.166.22
pcp_watchdog_info -v -p pcp_port -h 10.241.166.22

where "pcp_port" is your setting in pgpool.conf.

Carlos Mendez

2019-09-05 01:42

reporter   ~0002817

Hi Buddy
Yes it was the situation pgpool1 was down and pgpool2 and pgpool3 were as standby, then the VIP was released by pgpool1 but the VIP was not able to start in any other pgpool (2 or 3)

 Anyway looks we have fix this situation, and looks the issue is with a PORT since we have stopped the firewall for the other pgpool servers (pgpool2 and pgpool3) the VIP was able to start if we stop pgpool 1

#systemctl stop firewalld


Thanks

t-ishii

2019-09-05 09:14

developer   ~0002818

Thanks for the feedback. Can we close this issue?

Carlos Mendez

2019-09-05 23:57

reporter   ~0002822

hi Buddy
Yes the issue can be closed, thanks by your support

Regards

Issue History

Date Modified Username Field Change
2019-08-31 08:03 Carlos Mendez New Issue
2019-08-31 08:03 Carlos Mendez Tag Attached: settings
2019-08-31 08:03 Carlos Mendez Tag Attached: pgpool 3.7.1
2019-09-02 09:24 t-ishii Assigned To => t-ishii
2019-09-02 09:24 t-ishii Status new => assigned
2019-09-02 09:28 t-ishii Note Added: 0002807
2019-09-02 12:39 Carlos Mendez Note Added: 0002808
2019-09-02 13:07 t-ishii Note Added: 0002809
2019-09-02 13:11 t-ishii Status assigned => feedback
2019-09-03 01:17 Carlos Mendez Note Added: 0002811
2019-09-03 01:17 Carlos Mendez Status feedback => assigned
2019-09-03 09:30 t-ishii Note Added: 0002812
2019-09-03 09:30 t-ishii Status assigned => feedback
2019-09-04 02:11 Carlos Mendez Note Added: 0002814
2019-09-04 02:11 Carlos Mendez Status feedback => assigned
2019-09-04 07:28 t-ishii Note Added: 0002815
2019-09-04 07:28 t-ishii Status assigned => feedback
2019-09-04 10:41 t-ishii Note Added: 0002816
2019-09-04 11:57 t-ishii Note Edited: 0002816 View Revisions
2019-09-05 01:42 Carlos Mendez Note Added: 0002817
2019-09-05 01:42 Carlos Mendez Status feedback => assigned
2019-09-05 09:14 t-ishii Note Added: 0002818
2019-09-05 09:14 t-ishii Status assigned => feedback
2019-09-05 23:57 Carlos Mendez Note Added: 0002822
2019-09-05 23:57 Carlos Mendez Status feedback => assigned
2019-09-06 07:18 t-ishii Status assigned => resolved