View Issue Details

IDProjectCategoryView StatusLast Update
0000669Pgpool-IIBugpublic2020-12-10 13:27
Reporterjeffty_w Assigned Topengbo  
PrioritynormalSeveritymajorReproducibilityalways
Status feedbackResolutionopen 
PlatformLinuxOSRHELOS Version7.7
Product Version4.1.2 
Summary0000669: Watchdog escalation failed to acquire delegate IP
DescriptionHi there,

I've installed pgpool 4.1.2 with postgresql 12.3 on 3 RHEL 7.7 nodes. But pgpool always unable to acquire the delegate IP as:
pid 16341: DETAIL: 'if_up_cmd' failed
pid 16341: WARNING: watchdog escalation failed to acquire delegate IP

The error logs of pgpool:
......
pid 16130: DETAIL: set SO_REUSEPORT
pid 16124: LOG: watchdog cluster now holds the quorum
pid 16124: DETAIL: updating the state of quarantine backend nodes
pid 16126: LOG: new IPC connection received
pid 16133: LOG: set SO_REUSEPORT option to the socket
pid 16133: LOG: creating socket for sending heartbeat
pid 16133: DETAIL: set SO_REUSEPORT
pid 16341: LOG: failed to acquire the delegate IP address
pid 16341: DETAIL: 'if_up_cmd' failed
pid 16341: WARNING: watchdog escalation failed to acquire delegate IP
pid 16126: LOG: watchdog escalation process with pid: 16341 exit with SUCCESS.
pid 16337: LOG: failed to connect to PostgreSQL server on "post-db2:5432", getsockopt() detected error "Connection refused"
pid 16337: ERROR: failed to make persistent db connection
......


Installed RPMs:

postgresql12-server-12.3-1PGDG.rhel7.x86_64
postgresql12-llvmjit-12.3-1PGDG.rhel7.x86_64
postgresql12-devel-12.3-1PGDG.rhel7.x86_64
postgresql12-12.3-1PGDG.rhel7.x86_64
postgresql12-contrib-12.3-1PGDG.rhel7.x86_64
postgresql12-libs-12.3-1PGDG.rhel7.x86_64
pgpool-II-pg12-devel-4.1.2-1pgdg.rhel7.x86_64
pgpool-II-pg12-debuginfo-4.1.2-1pgdg.rhel7.x86_64
pgpool-II-pg12-extensions-4.1.2-1pgdg.rhel7.x86_64
pgpool-II-pg12-4.1.2-1pgdg.rhel7.x86_64

Check the if_up_cmd in pgpool.conf:
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0'

Since pgpool service is running under user postgres, so I copy the cmd and run it manually to test:

[root@post-db1 ~]# su - postgres
-bash-4.2$ /usr/bin/sudo /sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0
[sudo] password for postgres:

It needs password, so that may be the root cause. After googled I found https://www.iditect.com/how-to/56538238.html.
And I tried to cp the ip/arping to another folder for postgres as:

su - postgres
mkdir /var/lib/pgsql/sbin
cp /sbin/ip /var/lib/pgsql/sbin
cp /sbin/arping /var/lib/pgsql/sbin
chown -R postgres:postgres /var/lib/pgsql/sbin

visudo
postgres ALL = NOPASSWD: /var/lib/pgsql/sbin/ip *, /var/lib/pgsql/sbin/arping *

Then change the if_up_cmd in pgpool.conf as:
if_up_cmd = 'sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0'

Also I tried this manually as:
su - postgres
sudo /var/lib/pgsql/sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0

it succeed.
But after restarting pgpool service still get the above error message.
How can I get more detailed error info at least to know what is the root cause?

Also the selinux status:
#sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: disabled
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31

Thanks in advance.


Steps To Reproduce1. install pgpool 4.1.2 with postgresql 12.3 on 3 RHEL 7.7 nodes
2. start pgpool
3. check pgpool log and error occurrs:
LOG: failed to acquire the delegate IP address
DETAIL: 'if_up_cmd' failed
WARNING: watchdog escalation failed to acquire delegate IP
TagsNo tags attached.

Activities

pengbo

2020-12-09 16:15

developer   ~0003646

> Then change the if_up_cmd in pgpool.conf as:
> if_up_cmd = 'sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0'

You need to specify the full path of "sudo" command like:

    if_up_cmd = '/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0'

If you are using the package "pgpool-II-pg12-4.1.2-1pgdg.rhel7.x86_64" which is distributed by Pgpool community,
the "postgres" user has been configured to run ip/arping via sudo without a password.

-----
postgres ALL=NOPASSWD: /sbin/ip
postgres ALL=NOPASSWD: /usr/sbin/arping
-----
https://www.pgpool.net/docs/41/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG-WATCHDOG

jeffty_w

2020-12-10 01:36

reporter   ~0003650

Thanks pengbo.
I'm not sure why the installation with RPM locally hasn't added below items in sudoers automatically:
postgres ALL=NOPASSWD: /sbin/ip
postgres ALL=NOPASSWD: /usr/sbin/arping

Also I added another two lines for version2 & 3 testing:
postgres ALL=NOPASSWD: /var/lib/pgsql/sbin/ip
postgres ALL=NOPASSWD: /var/lib/pgsql/sbin/ipadd.sh

I added them with visudo and tried all 3 versions but none of them works.

1. The original version:
if_up_cmd = '/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0'

2. The 'copy ip/arping file to /var/lib/pgsql/sbin' version:
if_up_cmd = '/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0'

3. The customized shell script version:
if_up_cmd = '/usr/bin/sudo /var/lib/pgsql/sbin/ipadd.sh $_IP_$'

It's strange that If I run manually with postgres user all commands succeed:
su - postgres
#version1
/usr/bin/sudo /sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0
#version2
/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0
#version3
/usr/bin/sudo /var/lib/pgsql/sbin/ipadd.sh 192.180.211.88

some of the logs:
version 1
Dec 9 14:51:24 post-db1 pgpool[12252]: [1649686-1] 2020-12-09 14:51:24: pid 12252: DEBUG: STATE MACHINE INVOKED WITH EVENT = NETWORK IP IS REMOVED Current State = MASTER
Dec 9 14:51:24 post-db1 pgpool[12252]: [1649687-1] 2020-12-09 14:51:24: pid 12252: DEBUG: network IP is removed but system still has a valid IP is assigned
Dec 9 14:51:24 post-db1 pgpool[12252]: [1649688-1] 2020-12-09 14:51:24: pid 12252: DEBUG: IP = 192.180.211.88
Dec 9 14:51:24 post-db1 pgpool[12252]: [1649689-1] 2020-12-09 14:51:24: pid 12252: DEBUG: watchdog exec interface up/down command failed
Dec 9 14:51:24 post-db1 pgpool[12252]: [1649689-2] 2020-12-09 14:51:24: pid 12252: DETAIL: '/usr/bin/sudo /sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0' failed. exit status: 1
Dec 9 14:51:24 post-db1 pgpool[12252]: [1649690-1] 2020-12-09 14:51:24: pid 12252: LOG: failed to acquire the delegate IP address
Dec 9 14:51:24 post-db1 pgpool[12252]: [1649690-2] 2020-12-09 14:51:24: pid 12252: DETAIL: 'if_up_cmd' failed
Dec 9 14:51:24 post-db1 pgpool[12252]: [1649691-1] 2020-12-09 14:51:24: pid 12252: DEBUG: watchdog process signal handler
Dec 9 14:51:25 post-db1 pgpool[12258]: [532528-1] 2020-12-09 14:51:25: pid 12258: DEBUG: watchdog heartbeat: send 224 byte packet
Dec 9 14:51:25 post-db1 pgpool[12258]: [532529-1] 2020-12-09 14:51:25: pid 12258: DEBUG: watchdog heartbeat: send heartbeat signal to post-db2:9694
Dec 9 14:51:25 post-db1 pgpool[12262]: [532528-1] 2020-12-09 14:51:25: pid 12262: DEBUG: watchdog heartbeat: send 224 byte packet

version 2
Dec 9 14:52:53 post-db1 pgpool[25186]: [97-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_hostname" = "post-db1"
Dec 9 14:52:53 post-db1 pgpool[25186]: [98-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_port" = "9000"
Dec 9 14:52:53 post-db1 pgpool[25186]: [99-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_priority" = "1"
Dec 9 14:52:53 post-db1 pgpool[25186]: [100-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_authkey" = ""
Dec 9 14:52:53 post-db1 pgpool[25186]: [101-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_ipc_socket_dir" = "/tmp"
Dec 9 14:52:53 post-db1 pgpool[25186]: [102-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "delegate_IP" = "192.180.211.88"
Dec 9 14:52:53 post-db1 pgpool[25186]: [103-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "if_cmd_path" = "/sbin"
Dec 9 14:52:53 post-db1 pgpool[25186]: [104-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "if_up_cmd" = "/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0"
Dec 9 14:52:53 post-db1 pgpool[25186]: [105-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "if_down_cmd" = "/usr/bin/sudo /var/lib/pgsql/sbin/ip addr del $_IP_$/26 dev eth0"
Dec 9 14:52:53 post-db1 pgpool[25186]: [106-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "arping_path" = "/usr/sbin"
Dec 9 14:52:53 post-db1 pgpool[25186]: [107-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "arping_cmd" = "/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0"

Dec 9 14:53:35 post-db1 pgpool[25188]: [713-1] 2020-12-09 14:53:35: pid 25188: LOG: adding watchdog node "post-db2:9999 Linux post-db2" to the standby list
Dec 9 14:53:35 post-db1 pgpool[25188]: [722-1] 2020-12-09 14:53:35: pid 25188: LOG: new IPC connection received
Dec 9 14:53:35 post-db1 pgpool[25186]: [187-1] 2020-12-09 14:53:35: pid 25186: LOG: watchdog cluster now holds the quorum
Dec 9 14:53:35 post-db1 pgpool[25186]: [187-2] 2020-12-09 14:53:35: pid 25186: DETAIL: updating the state of quarantine backend nodes
Dec 9 14:53:35 post-db1 pgpool[25188]: [763-1] 2020-12-09 14:53:35: pid 25188: LOG: new IPC connection received
Dec 9 14:53:35 post-db1 pgpool[25483]: [695-1] 2020-12-09 14:53:35: pid 25483: LOG: failed to acquire the delegate IP address
Dec 9 14:53:35 post-db1 pgpool[25483]: [695-2] 2020-12-09 14:53:35: pid 25483: DETAIL: 'if_up_cmd' failed
Dec 9 14:53:35 post-db1 pgpool[25483]: [696-1] 2020-12-09 14:53:35: pid 25483: WARNING: watchdog escalation failed to acquire delegate IP
Dec 9 14:53:35 post-db1 pgpool[25188]: [767-1] 2020-12-09 14:53:35: pid 25188: LOG: watchdog escalation process with pid: 25483 exit with SUCCESS.
Dec 9 14:53:44 post-db1 pgpool[25192]: [222-1] 2020-12-09 14:53:44: pid 25192: DEBUG: watchdog heartbeat: received 224 byte packet


And I cannot find the reason why the command failed from the log. Any suggestions?


Thanks.

jeffty_w

2020-12-10 01:39

reporter   ~0003652

Sorry my bad. I need to correct one of the line in above post: the version 1 command should be:

1. The original version:
if_up_cmd = '/usr/bin/sudo /sbin/ip addr addr add $_IP_$/26 dev eth0 label eth0:0'

pengbo

2020-12-10 13:14

developer   ~0003653

Last edited: 2020-12-10 13:26

I think postgres can't execute "sudo" without a password due to some reasons.

Can you run the following command?
# su - postgres -c "/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0"

If the following error is displayed, I think you need to configure "/etc/sudoers"
sudo: sorry, you must have a tty to run sudo

It is mentioned in doc:
https://www.pgpool.net/docs/41/en/html/example-cluster.html

| Note: If "Defaults requiretty" is set in the /etc/sudoers, please ensure that the pgpool startup user can execute the if_up_cmd, if_down_cmd and arping_cmd command without a tty.



For #version2, you have setuid, you don't need to use "sudo" in if_up_cmd, if_down_cmd and arping_cmd.

if_cmd_path = '/var/lib/pgsql/sbin'
if_up_cmd = '/var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0'
if_down_cmd = '/var/lib/pgsql/sbin/ip addr del $_IP_$/26 dev eth0'

arping_path = '/var/lib/pgsql/sbin'
arping_cmd = '/var/lib/pgsql/sbin/arping -U $_IP_$ -w 1 -I eth0'

Issue History

Date Modified Username Field Change
2020-12-03 17:33 jeffty_w New Issue
2020-12-04 09:23 pengbo Assigned To => pengbo
2020-12-04 09:23 pengbo Status new => assigned
2020-12-09 16:15 pengbo Note Added: 0003646
2020-12-09 16:16 pengbo Status assigned => feedback
2020-12-10 01:36 jeffty_w Note Added: 0003650
2020-12-10 01:36 jeffty_w Status feedback => assigned
2020-12-10 01:39 jeffty_w Note Added: 0003652
2020-12-10 13:14 pengbo Note Added: 0003653
2020-12-10 13:15 pengbo Note Edited: 0003653
2020-12-10 13:26 pengbo Note Edited: 0003653
2020-12-10 13:27 pengbo Status assigned => feedback