View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000541 | Pgpool-II | Bug | public | 2019-08-31 08:03 | 2019-09-06 07:18 |
| Reporter | Carlos Mendez | Assigned To | t-ishii | ||
| Priority | high | Severity | feature | Reproducibility | always |
| Status | resolved | Resolution | open | ||
| Product Version | 3.7.1 | ||||
| Summary | 0000541: Unable to start Virtual IP for Pgpool | ||||
| Description | Environment Details: pgpool_01 IP: 10.241.166.21 VIP: 10.241.166.24 PRIMARY DB IP: 10.241.166.32 STANDBY DB IP: 10.241.166.33 *********************************************************************************** After to start pgpool services the Virtual IP is not available # /usr/share/pgpool-II/3.7.1/bin/pgpool -D -n > /var/log/pgpool/pgpool.log 2>&1 & *********************************************************************************** In the log file the last message is: 2019-08-30 12:16:16: pid 32201: WARNING: checking setuid bit of if_up_cmd 2019-08-30 12:16:16: pid 32201: DETAIL: ifup[/usr/sbin/ip] doesn't have setuid bit 2019-08-30 12:16:16: pid 32201: WARNING: checking setuid bit of if_down_cmd 2019-08-30 12:16:16: pid 32201: DETAIL: ifdown[/usr/sbin/ip] doesn't have setuid bit 2019-08-30 12:16:16: pid 32201: WARNING: checking setuid bit of arping command 2019-08-30 12:16:16: pid 32201: DETAIL: arping[/usr/sbin/arping] doesn't have setuid bit 2019-08-30 12:16:16: pid 32201: LOG: Backend status file /var/log/pgpool/pgpool_status discarded 2019-08-30 12:16:16: pid 32201: LOG: waiting for watchdog to initialize 2019-08-30 12:16:16: pid 32202: LOG: setting the local watchdog node name to "10.241.166.21:9999 Linux pgpool_01" 2019-08-30 12:16:16: pid 32202: LOG: watchdog cluster is configured with 2 remote nodes 2019-08-30 12:16:16: pid 32202: LOG: watchdog remote node:0 on pgpool_02:9000 2019-08-30 12:16:16: pid 32202: LOG: watchdog remote node:1 on pgpool_03:9000 2019-08-30 12:16:16: pid 32202: LOG: interface monitoring is disabled in watchdog 2019-08-30 12:16:16: pid 32202: LOG: watchdog node state changed from [DEAD] to [LOADING] 2019-08-30 12:16:21: pid 32202: LOG: watchdog node state changed from [LOADING] to [JOINING] 2019-08-30 12:16:25: pid 32202: LOG: watchdog node state changed from [JOINING] to [INITIALIZING] 2019-08-30 12:16:26: pid 32202: LOG: I am the only alive node in the watchdog cluster 2019-08-30 12:16:26: pid 32202: HINT: skiping stand for coordinator state 2019-08-30 12:16:26: pid 32202: LOG: watchdog node state changed from [INITIALIZING] to [MASTER] 2019-08-30 12:16:26: pid 32202: LOG: I am announcing my self as master/coordinator watchdog node 2019-08-30 12:16:30: pid 32202: LOG: I am the cluster leader node 2019-08-30 12:16:30: pid 32202: DETAIL: our declare coordinator message is accepted by all nodes 2019-08-30 12:16:30: pid 32202: LOG: setting the local node "10.241.166.21:9999 Linux pgpool_01" as watchdog cluster master 2019-08-30 12:16:30: pid 32202: LOG: I am the cluster leader node but we do not have enough nodes in cluster 2019-08-30 12:16:30: pid 32202: DETAIL: waiting for the quorum to start escalation process 2019-08-30 12:16:30: pid 32201: LOG: watchdog process is initialized 2019-08-30 12:16:30: pid 32202: LOG: new IPC connection received 2019-08-30 12:16:30: pid 32202: LOG: new IPC connection received 2019-08-30 12:16:30: pid 32217: LOG: 3 watchdog nodes are configured for lifecheck 2019-08-30 12:16:30: pid 32217: LOG: watchdog nodes ID:0 Name:"10.241.166.21:9999 Linux pgpool_01" 2019-08-30 12:16:30: pid 32217: DETAIL: Host:"10.241.166.21" WD Port:9000 pgpool-II port:9999 2019-08-30 12:16:30: pid 32217: LOG: watchdog nodes ID:1 Name:"Not_Set" 2019-08-30 12:16:30: pid 32217: DETAIL: Host:"pgpool_02" WD Port:9000 pgpool-II port:9999 2019-08-30 12:16:30: pid 32217: LOG: watchdog nodes ID:2 Name:"Not_Set" 2019-08-30 12:16:30: pid 32217: DETAIL: Host:"pgpool_03" WD Port:9000 pgpool-II port:9999 2019-08-30 12:16:30: pid 32201: LOG: Setting up socket for 0.0.0.0:9999 2019-08-30 12:16:30: pid 32201: LOG: Setting up socket for :::9999 2019-08-30 12:16:30: pid 32217: LOG: watchdog lifecheck trusted server "10.241.166.21" added for the availability check 2019-08-30 12:16:30: pid 32217: LOG: watchdog lifecheck trusted server "10.241.166.22" added for the availability check 2019-08-30 12:16:30: pid 32217: LOG: watchdog lifecheck trusted server "10.241.166.23" added for the availability check 2019-08-30 12:16:30: pid 32201: LOG: find_primary_node_repeatedly: waiting for finding a primary node 2019-08-30 12:16:30: pid 32201: LOG: find_primary_node: checking backend no 0 2019-08-30 12:16:30: pid 32201: LOG: find_primary_node: primary node id is 0 2019-08-30 12:16:30: pid 32201: LOG: pgpool-II successfully started. version 3.7.1 (amefuriboshi) 2019-08-30 12:16:31: pid 32218: LOG: set SO_REUSEPORT option to the socket 2019-08-30 12:16:31: pid 32218: LOG: creating watchdog heartbeat receive socket. 2019-08-30 12:16:31: pid 32218: DETAIL: set SO_REUSEPORT 2019-08-30 12:16:31: pid 32219: LOG: set SO_REUSEPORT option to the socket 2019-08-30 12:16:31: pid 32219: LOG: creating socket for sending heartbeat 2019-08-30 12:16:31: pid 32219: DETAIL: set SO_REUSEPORT 2019-08-30 12:16:31: pid 32221: LOG: set SO_REUSEPORT option to the socket 2019-08-30 12:16:31: pid 32221: LOG: creating watchdog heartbeat receive socket. 2019-08-30 12:16:31: pid 32221: DETAIL: set SO_REUSEPORT 2019-08-30 12:16:31: pid 32223: LOG: set SO_REUSEPORT option to the socket 2019-08-30 12:16:31: pid 32223: LOG: creating socket for sending heartbeat 2019-08-30 12:16:31: pid 32223: DETAIL: set SO_REUSEPORT ****************************************************************************************** The following commands for assign the VIP are working without any issues if these are executed directly if_up_cmd = 'ip addr add 10.241.166.24/25 dev eth0 label eth0:0' if_down_cmd = 'ip addr del 10.241.166.24/25 dev eth0' ******************************************************************************************** According to the log I think the issue can be with the following command SO_REUSEPORT but not sure about this please if you have faced this kind of error | ||||
| Steps To Reproduce | starting pgpool services # /usr/share/pgpool-II/3.7.1/bin/pgpool -D -n > /var/log/pgpool/pgpool.log 2>&1 & | ||||
| Tags | pgpool 3.7.1, settings | ||||
|
|
I noticed two things: 1) Pgpool-II 3.7.1 is way old. I strongly recommend to upgrade to the newest one (3.7.11 at this point). 2) Probably VIP was not set because there's only 1 live watchdog node. I am not sure if this is an expected behavior of watchdog, though. |
|
|
Hi T-Ishii Thanks by your comments 1. I will check with the operation for go to newest version 2. For the second point according to your comments then it's necessary to have more pgpools? Our configuration include 3 pgpools but at this moment we are only trying to start just one Tomorrow during my day I will try to start all the 3 pgpools and Check about this anyway if you have some other idea or comments about the issue please let me know Regards |
|
|
> 2. For the second point according to your comments then it's necessary to have more pgpools? > > Our configuration include 3 pgpools but at this moment we are only trying to start just one Yes, if you have 3 pgpools, then you should start at least two of them to let "quorum" exist. |
|
|
T-Ishii According to your last update, after to start the services for the second pgpool the VIP was assigned to the first server this is something that I did not know thanks by your support In the log for the pgpool 1 I was able to see the following messages: 2019-09-02 06:01:40: pid 32202: LOG: new watchdog node connection is received from "10.241.166.22:60088" 2019-09-02 06:01:40: pid 32202: LOG: new node joined the cluster hostname:"pgpool_02" port:9000 pgpool_port:9999 2019-09-02 06:01:41: pid 32202: LOG: adding watchdog node "pgpool_02:9999 Linux pgpool_02" to the standby list 2019-09-02 06:01:41: pid 32202: LOG: quorum found 2019-09-02 06:01:41: pid 32202: DETAIL: starting escalation process 2019-09-02 06:01:41: pid 32202: LOG: escalation process started with PID:6889 2019-09-02 06:01:41: pid 6889: LOG: watchdog: escalation started 2019-09-02 06:01:41: pid 32201: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog 2019-09-02 06:01:41: pid 32202: LOG: new IPC connection received 2019-09-02 06:01:41: pid 32201: LOG: watchdog cluster now holds the quorum 2019-09-02 06:01:41: pid 32201: DETAIL: updating the state of quarantine backend nodes 2019-09-02 06:01:41: pid 32202: LOG: new IPC connection received 2019-09-02 06:01:45: pid 6889: LOG: successfully acquired the delegate IP:"10.241.166.24" 2019-09-02 06:01:45: pid 6889: DETAIL: 'if_up_cmd' returned with success 2019-09-02 06:01:45: pid 32202: LOG: watchdog escalation process with pid: 6889 exit with SUCCESS. 2019-09-02 06:04:18: pid 32286: LOG: new connection received 2019-09-02 06:04:18: pid 32286: DETAIL: connecting host=10.241.156.104 port=64471 2019-09-02 06:04:23: pid 32308: LOG: new connection received 2019-09-02 06:04:23: pid 32308: DETAIL: connecting host=10.241.156.104 port=64474 2019-09-02 06:11:58: pid 32202: LOG: new watchdog node connection is received from "10.241.166.23:38023" 2019-09-02 06:11:58: pid 32202: LOG: new node joined the cluster hostname:"pgpool_03" port:9000 pgpool_port:9999 2019-09-02 06:11:59: pid 32202: LOG: adding watchdog node "pgpool_03:9999 Linux pgpool_03" to the standby list 2019-09-02 06:13:10: pid 32217: LOG: watchdog: lifecheck started Regards CAR |
|
|
Thanks for the report. Now I wonder if you shut down pgpool1 (i.e. pgpool0 is the only live node), then VIP is released or not. I know that if you start the whole cluster with only 1 node, VIP is not assigned but this is another scenario. Can you please test it out? |
|
|
Hi T-ishii At this moment we were able to start the VIP 10.241.166.24 thi is our configuration now: pgpool_01 10.241.166.21 pgpool_02 10.241.166.22 pgpool_03 10.241.166.23 Primary DB 10.241.166.32 Standby DB 10.241.166. 33 If I start PGPOOL01 and some other PGPOOL (PGPOOL2 or PGPOOL3 ) the VIP is started on PGPOOL 1 But if I stop the services for pgpool1 the VIP is released but is not started in the remains pgpools (pgpool2 or pgpool3) and the service to the Primary DB is affected on the Logs we can see this: STARTING SERVICES PGPOOL1 LOG 2019-09-03 12:04:50: pid 16453: LOG: set SO_REUSEPORT option to the socket 2019-09-03 12:04:50: pid 16453: LOG: creating watchdog heartbeat receive socket. 2019-09-03 12:04:50: pid 16453: DETAIL: set SO_REUSEPORT 2019-09-03 12:04:56: pid 16438: LOG: new watchdog node connection is received from "10.241.166.22:28397" 2019-09-03 12:04:56: pid 16438: LOG: new node joined the cluster hostname:"pgpool_02" port:9000 pgpool_port:9999 2019-09-03 12:04:57: pid 16438: LOG: adding watchdog node "pgpool_02:9999 Linux pgpool_02" to the standby list 2019-09-03 12:04:57: pid 16438: LOG: quorum found 2019-09-03 12:04:57: pid 16438: DETAIL: starting escalation process 2019-09-03 12:04:57: pid 16438: LOG: escalation process started with PID:16568 2019-09-03 12:04:57: pid 16568: LOG: watchdog: escalation started 2019-09-03 12:04:57: pid 16437: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog 2019-09-03 12:04:57: pid 16438: LOG: new IPC connection received 2019-09-03 12:04:57: pid 16437: LOG: watchdog cluster now holds the quorum 2019-09-03 12:04:57: pid 16437: DETAIL: updating the state of quarantine backend nodes 2019-09-03 12:04:57: pid 16438: LOG: new IPC connection received 2019-09-03 12:05:01: pid 16568: LOG: successfully acquired the delegate IP:"10.241.166.24" 2019-09-03 12:05:01: pid 16568: DETAIL: 'if_up_cmd' returned with success 2019-09-03 12:05:01: pid 16438: LOG: watchdog escalation process with pid: 16568 exit with SUCCESS. 2019-09-03 12:05:56: pid 16438: LOG: new watchdog node connection is received from "10.241.166.23:51383" 2019-09-03 12:05:56: pid 16438: LOG: new node joined the cluster hostname:"pgpool_03" port:9000 pgpool_port:9999 2019-09-03 12:05:57: pid 16438: LOG: adding watchdog node "pgpool_03:9999 Linux pgpool_03" to the standby list 2019-09-03 12:06:29: pid 16452: LOG: watchdog: lifecheck started ************************************************************************************************************* STOPPING SERVICES PGPOOL1 PGPOOL1 LOG: 2019-09-03 12:07:37: pid 16438: LOG: Watchdog is shutting down 2019-09-03 12:07:37: pid 16720: LOG: watchdog: de-escalation started 2019-09-03 12:07:37: pid 16720: LOG: successfully released the delegate IP:"10.241.166.24" 2019-09-03 12:07:37: pid 16720: DETAIL: 'if_down_cmd' returned with success PGPOOL 2 LOG 2019-09-03 12:07:37: pid 20356: LOG: remote node "pgpool_01:9999 Linux pgpool_01" is shutting down 2019-09-03 12:07:37: pid 20356: LOG: watchdog cluster has lost the coordinator node 2019-09-03 12:07:37: pid 20356: LOG: unassigning the remote node "pgpool_01:9999 Linux pgpool_01" from watchdog cluster master 2019-09-03 12:07:37: pid 20356: LOG: We have lost the cluster master node "pgpool_01:9999 Linux pgpool_01" 2019-09-03 12:07:37: pid 20356: LOG: watchdog node state changed from [STANDBY] to [JOINING] 2019-09-03 12:07:41: pid 20356: LOG: watchdog node state changed from [JOINING] to [INITIALIZING] 2019-09-03 12:07:42: pid 20356: LOG: I am the only alive node in the watchdog cluster 2019-09-03 12:07:42: pid 20356: HINT: skiping stand for coordinator state 2019-09-03 12:07:42: pid 20356: LOG: watchdog node state changed from [INITIALIZING] to [MASTER] 2019-09-03 12:07:42: pid 20356: LOG: I am announcing my self as master/coordinator watchdog node 2019-09-03 12:07:46: pid 20356: LOG: I am the cluster leader node 2019-09-03 12:07:46: pid 20356: DETAIL: our declare coordinator message is accepted by all nodes 2019-09-03 12:07:46: pid 20356: LOG: setting the local node "pgpool_02:9999 Linux pgpool_02" as watchdog cluster master 2019-09-03 12:07:46: pid 20356: LOG: I am the cluster leader node but we do not have enough nodes in cluster 2019-09-03 12:07:46: pid 20356: DETAIL: waiting for the quorum to start escalation process 2019-09-03 12:07:46: pid 20355: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog 2019-09-03 12:07:46: pid 20356: LOG: new IPC connection received 2019-09-03 12:07:46: pid 20356: LOG: new IPC connection received PGPOOL3 LOG 2019-09-03 12:07:37: pid 20870: LOG: remote node "pgpool_01:9999 Linux pgpool_01" is shutting down 2019-09-03 12:07:37: pid 20870: LOG: watchdog cluster has lost the coordinator node 2019-09-03 12:07:37: pid 20870: LOG: unassigning the remote node "pgpool_01:9999 Linux pgpool_01" from watchdog cluster master 2019-09-03 12:07:37: pid 20870: LOG: We have lost the cluster master node "pgpool_01:9999 Linux pgpool_01" 2019-09-03 12:07:37: pid 20870: LOG: watchdog node state changed from [STANDBY] to [JOINING] 2019-09-03 12:07:41: pid 20870: LOG: watchdog node state changed from [JOINING] to [INITIALIZING] 2019-09-03 12:07:42: pid 20870: LOG: I am the only alive node in the watchdog cluster 2019-09-03 12:07:42: pid 20870: HINT: skiping stand for coordinator state 2019-09-03 12:07:42: pid 20870: LOG: watchdog node state changed from [INITIALIZING] to [MASTER] 2019-09-03 12:07:42: pid 20870: LOG: I am announcing my self as master/coordinator watchdog node 2019-09-03 12:07:46: pid 20870: LOG: I am the cluster leader node 2019-09-03 12:07:46: pid 20870: DETAIL: our declare coordinator message is accepted by all nodes 2019-09-03 12:07:46: pid 20870: LOG: setting the local node "pgpool_03:9999 Linux pgpool_03" as watchdog cluster master 2019-09-03 12:07:46: pid 20870: LOG: I am the cluster leader node but we do not have enough nodes in cluster 2019-09-03 12:07:46: pid 20870: DETAIL: waiting for the quorum to start escalation process 2019-09-03 12:07:46: pid 20870: LOG: new IPC connection received As we can see the VIP is released by pgpool1 but is not started in any other PGPOOL Any comments will help me Regards |
|
|
I think it's an expected behavior. If there's only one node out of 3, then there's no quorum, thus pgpool won't bring up VIP. You need to keep at least 2 nodes to bring up VIP. |
|
|
Wait. If pgpool2 and pgpool3 are alive, VIP should be brought up. Can you show the result of following commands: pcp_watchdog_info -v -p pcp_port -h 10.241.166.22 pcp_watchdog_info -v -p pcp_port -h 10.241.166.22 where "pcp_port" is your setting in pgpool.conf. |
|
|
Hi Buddy Yes it was the situation pgpool1 was down and pgpool2 and pgpool3 were as standby, then the VIP was released by pgpool1 but the VIP was not able to start in any other pgpool (2 or 3) Anyway looks we have fix this situation, and looks the issue is with a PORT since we have stopped the firewall for the other pgpool servers (pgpool2 and pgpool3) the VIP was able to start if we stop pgpool 1 #systemctl stop firewalld Thanks |
|
|
Thanks for the feedback. Can we close this issue? |
|
|
hi Buddy Yes the issue can be closed, thanks by your support Regards |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2019-08-31 08:03 | Carlos Mendez | New Issue | |
| 2019-08-31 08:03 | Carlos Mendez | Tag Attached: settings | |
| 2019-08-31 08:03 | Carlos Mendez | Tag Attached: pgpool 3.7.1 | |
| 2019-09-02 09:24 | t-ishii | Assigned To | => t-ishii |
| 2019-09-02 09:24 | t-ishii | Status | new => assigned |
| 2019-09-02 09:28 | t-ishii | Note Added: 0002807 | |
| 2019-09-02 12:39 | Carlos Mendez | Note Added: 0002808 | |
| 2019-09-02 13:07 | t-ishii | Note Added: 0002809 | |
| 2019-09-02 13:11 | t-ishii | Status | assigned => feedback |
| 2019-09-03 01:17 | Carlos Mendez | Note Added: 0002811 | |
| 2019-09-03 01:17 | Carlos Mendez | Status | feedback => assigned |
| 2019-09-03 09:30 | t-ishii | Note Added: 0002812 | |
| 2019-09-03 09:30 | t-ishii | Status | assigned => feedback |
| 2019-09-04 02:11 | Carlos Mendez | Note Added: 0002814 | |
| 2019-09-04 02:11 | Carlos Mendez | Status | feedback => assigned |
| 2019-09-04 07:28 | t-ishii | Note Added: 0002815 | |
| 2019-09-04 07:28 | t-ishii | Status | assigned => feedback |
| 2019-09-04 10:41 | t-ishii | Note Added: 0002816 | |
| 2019-09-04 11:57 | t-ishii | Note Edited: 0002816 | |
| 2019-09-05 01:42 | Carlos Mendez | Note Added: 0002817 | |
| 2019-09-05 01:42 | Carlos Mendez | Status | feedback => assigned |
| 2019-09-05 09:14 | t-ishii | Note Added: 0002818 | |
| 2019-09-05 09:14 | t-ishii | Status | assigned => feedback |
| 2019-09-05 23:57 | Carlos Mendez | Note Added: 0002822 | |
| 2019-09-05 23:57 | Carlos Mendez | Status | feedback => assigned |
| 2019-09-06 07:18 | t-ishii | Status | assigned => resolved |