View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000669 | Pgpool-II | Bug | public | 2020-12-03 17:33 | 2020-12-10 13:27 |
| Reporter | jeffty_w | Assigned To | pengbo | ||
| Priority | normal | Severity | major | Reproducibility | always |
| Status | feedback | Resolution | open | ||
| Platform | Linux | OS | RHEL | OS Version | 7.7 |
| Product Version | 4.1.2 | ||||
| Summary | 0000669: Watchdog escalation failed to acquire delegate IP | ||||
| Description | Hi there, I've installed pgpool 4.1.2 with postgresql 12.3 on 3 RHEL 7.7 nodes. But pgpool always unable to acquire the delegate IP as: pid 16341: DETAIL: 'if_up_cmd' failed pid 16341: WARNING: watchdog escalation failed to acquire delegate IP The error logs of pgpool: ...... pid 16130: DETAIL: set SO_REUSEPORT pid 16124: LOG: watchdog cluster now holds the quorum pid 16124: DETAIL: updating the state of quarantine backend nodes pid 16126: LOG: new IPC connection received pid 16133: LOG: set SO_REUSEPORT option to the socket pid 16133: LOG: creating socket for sending heartbeat pid 16133: DETAIL: set SO_REUSEPORT pid 16341: LOG: failed to acquire the delegate IP address pid 16341: DETAIL: 'if_up_cmd' failed pid 16341: WARNING: watchdog escalation failed to acquire delegate IP pid 16126: LOG: watchdog escalation process with pid: 16341 exit with SUCCESS. pid 16337: LOG: failed to connect to PostgreSQL server on "post-db2:5432", getsockopt() detected error "Connection refused" pid 16337: ERROR: failed to make persistent db connection ...... Installed RPMs: postgresql12-server-12.3-1PGDG.rhel7.x86_64 postgresql12-llvmjit-12.3-1PGDG.rhel7.x86_64 postgresql12-devel-12.3-1PGDG.rhel7.x86_64 postgresql12-12.3-1PGDG.rhel7.x86_64 postgresql12-contrib-12.3-1PGDG.rhel7.x86_64 postgresql12-libs-12.3-1PGDG.rhel7.x86_64 pgpool-II-pg12-devel-4.1.2-1pgdg.rhel7.x86_64 pgpool-II-pg12-debuginfo-4.1.2-1pgdg.rhel7.x86_64 pgpool-II-pg12-extensions-4.1.2-1pgdg.rhel7.x86_64 pgpool-II-pg12-4.1.2-1pgdg.rhel7.x86_64 Check the if_up_cmd in pgpool.conf: if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0' Since pgpool service is running under user postgres, so I copy the cmd and run it manually to test: [root@post-db1 ~]# su - postgres -bash-4.2$ /usr/bin/sudo /sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0 [sudo] password for postgres: It needs password, so that may be the root cause. After googled I found https://www.iditect.com/how-to/56538238.html. And I tried to cp the ip/arping to another folder for postgres as: su - postgres mkdir /var/lib/pgsql/sbin cp /sbin/ip /var/lib/pgsql/sbin cp /sbin/arping /var/lib/pgsql/sbin chown -R postgres:postgres /var/lib/pgsql/sbin visudo postgres ALL = NOPASSWD: /var/lib/pgsql/sbin/ip *, /var/lib/pgsql/sbin/arping * Then change the if_up_cmd in pgpool.conf as: if_up_cmd = 'sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0' Also I tried this manually as: su - postgres sudo /var/lib/pgsql/sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0 it succeed. But after restarting pgpool service still get the above error message. How can I get more detailed error info at least to know what is the root cause? Also the selinux status: #sestatus SELinux status: enabled SELinuxfs mount: /sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: targeted Current mode: permissive Mode from config file: disabled Policy MLS status: enabled Policy deny_unknown status: allowed Max kernel policy version: 31 Thanks in advance. | ||||
| Steps To Reproduce | 1. install pgpool 4.1.2 with postgresql 12.3 on 3 RHEL 7.7 nodes 2. start pgpool 3. check pgpool log and error occurrs: LOG: failed to acquire the delegate IP address DETAIL: 'if_up_cmd' failed WARNING: watchdog escalation failed to acquire delegate IP | ||||
| Tags | No tags attached. | ||||
|
|
> Then change the if_up_cmd in pgpool.conf as: > if_up_cmd = 'sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0' You need to specify the full path of "sudo" command like: if_up_cmd = '/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0' If you are using the package "pgpool-II-pg12-4.1.2-1pgdg.rhel7.x86_64" which is distributed by Pgpool community, the "postgres" user has been configured to run ip/arping via sudo without a password. ----- postgres ALL=NOPASSWD: /sbin/ip postgres ALL=NOPASSWD: /usr/sbin/arping ----- https://www.pgpool.net/docs/41/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG-WATCHDOG |
|
|
Thanks pengbo. I'm not sure why the installation with RPM locally hasn't added below items in sudoers automatically: postgres ALL=NOPASSWD: /sbin/ip postgres ALL=NOPASSWD: /usr/sbin/arping Also I added another two lines for version2 & 3 testing: postgres ALL=NOPASSWD: /var/lib/pgsql/sbin/ip postgres ALL=NOPASSWD: /var/lib/pgsql/sbin/ipadd.sh I added them with visudo and tried all 3 versions but none of them works. 1. The original version: if_up_cmd = '/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0' 2. The 'copy ip/arping file to /var/lib/pgsql/sbin' version: if_up_cmd = '/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0' 3. The customized shell script version: if_up_cmd = '/usr/bin/sudo /var/lib/pgsql/sbin/ipadd.sh $_IP_$' It's strange that If I run manually with postgres user all commands succeed: su - postgres #version1 /usr/bin/sudo /sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0 #version2 /usr/bin/sudo /var/lib/pgsql/sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0 #version3 /usr/bin/sudo /var/lib/pgsql/sbin/ipadd.sh 192.180.211.88 some of the logs: version 1 Dec 9 14:51:24 post-db1 pgpool[12252]: [1649686-1] 2020-12-09 14:51:24: pid 12252: DEBUG: STATE MACHINE INVOKED WITH EVENT = NETWORK IP IS REMOVED Current State = MASTER Dec 9 14:51:24 post-db1 pgpool[12252]: [1649687-1] 2020-12-09 14:51:24: pid 12252: DEBUG: network IP is removed but system still has a valid IP is assigned Dec 9 14:51:24 post-db1 pgpool[12252]: [1649688-1] 2020-12-09 14:51:24: pid 12252: DEBUG: IP = 192.180.211.88 Dec 9 14:51:24 post-db1 pgpool[12252]: [1649689-1] 2020-12-09 14:51:24: pid 12252: DEBUG: watchdog exec interface up/down command failed Dec 9 14:51:24 post-db1 pgpool[12252]: [1649689-2] 2020-12-09 14:51:24: pid 12252: DETAIL: '/usr/bin/sudo /sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0' failed. exit status: 1 Dec 9 14:51:24 post-db1 pgpool[12252]: [1649690-1] 2020-12-09 14:51:24: pid 12252: LOG: failed to acquire the delegate IP address Dec 9 14:51:24 post-db1 pgpool[12252]: [1649690-2] 2020-12-09 14:51:24: pid 12252: DETAIL: 'if_up_cmd' failed Dec 9 14:51:24 post-db1 pgpool[12252]: [1649691-1] 2020-12-09 14:51:24: pid 12252: DEBUG: watchdog process signal handler Dec 9 14:51:25 post-db1 pgpool[12258]: [532528-1] 2020-12-09 14:51:25: pid 12258: DEBUG: watchdog heartbeat: send 224 byte packet Dec 9 14:51:25 post-db1 pgpool[12258]: [532529-1] 2020-12-09 14:51:25: pid 12258: DEBUG: watchdog heartbeat: send heartbeat signal to post-db2:9694 Dec 9 14:51:25 post-db1 pgpool[12262]: [532528-1] 2020-12-09 14:51:25: pid 12262: DEBUG: watchdog heartbeat: send 224 byte packet version 2 Dec 9 14:52:53 post-db1 pgpool[25186]: [97-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_hostname" = "post-db1" Dec 9 14:52:53 post-db1 pgpool[25186]: [98-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_port" = "9000" Dec 9 14:52:53 post-db1 pgpool[25186]: [99-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_priority" = "1" Dec 9 14:52:53 post-db1 pgpool[25186]: [100-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_authkey" = "" Dec 9 14:52:53 post-db1 pgpool[25186]: [101-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "wd_ipc_socket_dir" = "/tmp" Dec 9 14:52:53 post-db1 pgpool[25186]: [102-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "delegate_IP" = "192.180.211.88" Dec 9 14:52:53 post-db1 pgpool[25186]: [103-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "if_cmd_path" = "/sbin" Dec 9 14:52:53 post-db1 pgpool[25186]: [104-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "if_up_cmd" = "/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0" Dec 9 14:52:53 post-db1 pgpool[25186]: [105-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "if_down_cmd" = "/usr/bin/sudo /var/lib/pgsql/sbin/ip addr del $_IP_$/26 dev eth0" Dec 9 14:52:53 post-db1 pgpool[25186]: [106-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "arping_path" = "/usr/sbin" Dec 9 14:52:53 post-db1 pgpool[25186]: [107-1] 2020-12-09 14:52:53: pid 25186: DEBUG: set_config_option "arping_cmd" = "/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0" Dec 9 14:53:35 post-db1 pgpool[25188]: [713-1] 2020-12-09 14:53:35: pid 25188: LOG: adding watchdog node "post-db2:9999 Linux post-db2" to the standby list Dec 9 14:53:35 post-db1 pgpool[25188]: [722-1] 2020-12-09 14:53:35: pid 25188: LOG: new IPC connection received Dec 9 14:53:35 post-db1 pgpool[25186]: [187-1] 2020-12-09 14:53:35: pid 25186: LOG: watchdog cluster now holds the quorum Dec 9 14:53:35 post-db1 pgpool[25186]: [187-2] 2020-12-09 14:53:35: pid 25186: DETAIL: updating the state of quarantine backend nodes Dec 9 14:53:35 post-db1 pgpool[25188]: [763-1] 2020-12-09 14:53:35: pid 25188: LOG: new IPC connection received Dec 9 14:53:35 post-db1 pgpool[25483]: [695-1] 2020-12-09 14:53:35: pid 25483: LOG: failed to acquire the delegate IP address Dec 9 14:53:35 post-db1 pgpool[25483]: [695-2] 2020-12-09 14:53:35: pid 25483: DETAIL: 'if_up_cmd' failed Dec 9 14:53:35 post-db1 pgpool[25483]: [696-1] 2020-12-09 14:53:35: pid 25483: WARNING: watchdog escalation failed to acquire delegate IP Dec 9 14:53:35 post-db1 pgpool[25188]: [767-1] 2020-12-09 14:53:35: pid 25188: LOG: watchdog escalation process with pid: 25483 exit with SUCCESS. Dec 9 14:53:44 post-db1 pgpool[25192]: [222-1] 2020-12-09 14:53:44: pid 25192: DEBUG: watchdog heartbeat: received 224 byte packet And I cannot find the reason why the command failed from the log. Any suggestions? Thanks. |
|
|
Sorry my bad. I need to correct one of the line in above post: the version 1 command should be: 1. The original version: if_up_cmd = '/usr/bin/sudo /sbin/ip addr addr add $_IP_$/26 dev eth0 label eth0:0' |
|
|
I think postgres can't execute "sudo" without a password due to some reasons. Can you run the following command? # su - postgres -c "/usr/bin/sudo /var/lib/pgsql/sbin/ip addr add 192.180.211.88/26 dev eth0 label eth0:0" If the following error is displayed, I think you need to configure "/etc/sudoers" sudo: sorry, you must have a tty to run sudo It is mentioned in doc: https://www.pgpool.net/docs/41/en/html/example-cluster.html | Note: If "Defaults requiretty" is set in the /etc/sudoers, please ensure that the pgpool startup user can execute the if_up_cmd, if_down_cmd and arping_cmd command without a tty. For #version2, you have setuid, you don't need to use "sudo" in if_up_cmd, if_down_cmd and arping_cmd. if_cmd_path = '/var/lib/pgsql/sbin' if_up_cmd = '/var/lib/pgsql/sbin/ip addr add $_IP_$/26 dev eth0 label eth0:0' if_down_cmd = '/var/lib/pgsql/sbin/ip addr del $_IP_$/26 dev eth0' arping_path = '/var/lib/pgsql/sbin' arping_cmd = '/var/lib/pgsql/sbin/arping -U $_IP_$ -w 1 -I eth0' |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2020-12-03 17:33 | jeffty_w | New Issue | |
| 2020-12-04 09:23 | pengbo | Assigned To | => pengbo |
| 2020-12-04 09:23 | pengbo | Status | new => assigned |
| 2020-12-09 16:15 | pengbo | Note Added: 0003646 | |
| 2020-12-09 16:16 | pengbo | Status | assigned => feedback |
| 2020-12-10 01:36 | jeffty_w | Note Added: 0003650 | |
| 2020-12-10 01:36 | jeffty_w | Status | feedback => assigned |
| 2020-12-10 01:39 | jeffty_w | Note Added: 0003652 | |
| 2020-12-10 13:14 | pengbo | Note Added: 0003653 | |
| 2020-12-10 13:15 | pengbo | Note Edited: 0003653 | |
| 2020-12-10 13:26 | pengbo | Note Edited: 0003653 | |
| 2020-12-10 13:27 | pengbo | Status | assigned => feedback |