[pgpool-general: 6006] WARNING: watchdog escalation failed to acquire delegate IP

MOISES ESPINOSA moikmeg at gmail.com
Wed Mar 21 18:17:17 JST 2018


Hi,

I'm using pgpool version 3.6.7 .

I have a cluster with 3 pgpool-II.
1 pgpool master (pgpool1) and 2 pgpool in standby mode (pgpool2 and
pgpool3).

This is my arping configuration:

arping_cmd = 'arping -I eth0 -U $_IP_$ -w 1'

I show the log when pgpool master was stopped:

                                 *log stop master pgpool1*

2018-03-15 02:28:14: pid 10196: LOG:  Watchdog is shutting down
2018-03-15 02:28:14: pid 10196: LOCATION:  watchdog.c:3185
2018-03-15 02:28:14: pid 21128: LOG:  watchdog: de-escalation started
2018-03-15 02:28:14: pid 21128: LOCATION:  wd_escalation.c:181
2018-03-15 02:28:14: pid 21128: LOG:  successfully released the delegate
IP:"105.25.30.245"
2018-03-15 02:28:14: pid 21128: DETAIL:  'if_down_cmd' returned with success
2018-03-15 02:28:14: pid 21128: LOCATION:  wd_if.c:208

                                 *log from pgpool2 (new master)*

2018-03-15 02:28:14: pid 14560: LOG:  remote node "pgpool1:20010 Linux
srv0mobipvlsk01" is shutting down
2018-03-15 02:28:14: pid 14560: LOG:  watchdog cluster has lost the
coordinator node
2018-03-15 02:28:14: pid 14560: LOG:  unassigning the remote node
"pgpool1:20010 Linux pgpool1" from watchdog cluster master
2018-03-15 02:28:14: pid 14560: LOG:  We have lost the cluster master node
"pgpool1:50013 Linux pgpool1"
2018-03-15 02:28:14: pid 14560: LOG:  watchdog node state changed from
[STANDBY] to [JOINING]
2018-03-15 02:28:14: pid 14560: LOG:  watchdog node state changed from
[JOINING] to [INITIALIZING]
2018-03-15 02:28:14: pid 14557: WARNING:  health check timed out while
waiting for reading data
2018-03-15 02:28:14: pid 14557: ERROR:  unable to read data from DB node 0
2018-03-15 02:28:14: pid 14557: LOG:  pgpool-II successfully started.
version 3.6.7 (subaruboshi)
2018-03-15 02:28:15: pid 14560: LOG:  watchdog node state changed from
[INITIALIZING] to [STANDING FOR MASTER]
2018-03-15 02:28:15: pid 14560: LOG:  watchdog node state changed from
[STANDING FOR MASTER] to [MASTER]
2018-03-15 02:28:15: pid 14560: LOG:  I am announcing my self as
master/coordinator watchdog node
2018-03-15 02:28:15: pid 14560: LOG:  I am the cluster leader node
2018-03-15 02:28:15: pid 14560: LOG:  setting the local node "pgpool2:20010
Linux pgpool2" as watchdog cluster master
2018-03-15 02:28:15: pid 14560: LOG:  I am the cluster leader node but we
do not have enough nodes in cluster
2018-03-15 02:28:16: pid 14560: LOG:  adding watchdog node "pgpool3:20010
Linux pgpool3" to the standby list
2018-03-15 02:28:16: pid 14560: LOG:  quorum found
2018-03-15 02:28:16: pid 14560: LOG:  escalation process started with
PID:11487
2018-03-15 02:28:16: pid 11487: LOG:  watchdog: escalation started
RTNETLINK answers: File exists
*2018-03-15 02:28:16: pid 11487: LOG:  failed to acquire the delegate IP
address*
*2018-03-15 02:28:16: pid 11487: WARNING:  watchdog escalation failed to
acquire delegate IP*
2018-03-15 02:28:16: pid 14560: LOG:  watchdog escalation process with pid:
11487 exit with SUCCESS.
2018-03-15 02:28:17: pid 14557: WARNING:  health check timed out while
waiting for reading data
2018-03-15 02:28:17: pid 14557: ERROR:  unable to read data from DB node 0
2018-03-15 02:28:17: pid 14560: LOG:  new IPC connection received
2018-03-15 02:28:19: pid 14557: LOG:  pgpool-II successfully started.
version 3.6.7 (subaruboshi)
2018-03-15 02:28:19: pid 14557: LOG:  health checking retry count 1
2018-03-15 02:28:19: pid 14557: LOG:  all backends are returned to a
healthy state after 1 retry(ies)
2018-03-15 02:28:24: pid 14557: LOG:  child process with pid: 10747 exits
with status 256
2018-03-15 02:28:24: pid 14557: LOG:  fork a new child process with pid:
11542
2018-03-15 02:28:24: pid 15780: LOG:  forked new pcp worker, pid=11554
socket=8
2018-03-15 02:28:24: pid 14560: LOG:  new IPC connection received
2018-03-15 02:28:24: pid 15780: LOG:  PCP process with pid: 11554 exit with
SUCCESS.
2018-03-15 02:28:24: pid 15780: LOG:  PCP process with pid: 11554 exits
with status 0
2018-03-15 02:28:24: pid 30983: LOG:  new connection received
2018-03-15 02:28:39: pid 14561: LOG:  informing the node status change to
watchdog
2018-03-15 02:28:39: pid 14560: LOG:  new IPC connection received
2018-03-15 02:28:39: pid 14560: LOG:  received node status change ipc
message
*2018-03-15 02:28:39: pid 14560: LOG:  remote node "pgpool1:20010 Linux
pgpool1" is shutting down*
2018-03-15 02:28:41: pid 15780: LOG:  forked new pcp worker, pid=11643
socket=8
2018-03-15 02:28:41: pid 14560: LOG:  new IPC connection received
2018-03-15 02:28:41: pid 15780: LOG:  PCP process with pid: 11643 exit with
SUCCESS.
2018-03-15 02:28:41: pid 15780: LOG:  PCP process with pid: 11643 exits
with status 0
2018-03-15 02:29:24: pid 14557: LOG:  child process with pid: 30983 exits
with status 256
2018-03-15 02:29:24: pid 14557: LOG:  fork a new child process with pid:
11738
2018-03-15 02:29:24: pid 15780: LOG:  forked new pcp worker, pid=11742
socket=8
2018-03-15 02:29:24: pid 14560: LOG:  new IPC connection received
2018-03-15 02:29:24: pid 15780: LOG:  PCP process with pid: 11742 exit with
SUCCESS.
2018-03-15 02:29:24: pid 15780: LOG:  PCP process with pid: 11742 exits
with status 0
2018-03-15 02:29:24: pid 11542: LOG:  new connection received
2018-03-15 02:29:42: pid 11542: LOG:  new connection received
2018-03-15 02:29:49: pid 4483: LOG:  new connection received
2018-03-15 02:30:24: pid 15780: LOG:  forked new pcp worker, pid=11899
socket=8
2018-03-15 02:30:24: pid 14560: LOG:  new IPC connection received
2018-03-15 02:30:24: pid 15780: LOG:  PCP process with pid: 11899 exit with
SUCCESS.
2018-03-15 02:30:24: pid 15780: LOG:  PCP process with pid: 11899 exits
with status 0
2018-03-15 02:30:24: pid 11738: LOG:  new connection received
2018-03-15 02:30:42: pid 14557: LOG:  child process with pid: 11542 exits
with status 256
2018-03-15 02:30:42: pid 14557: LOG:  fork a new child process with pid:
11944
2018-03-15 02:31:10: pid 14557: LOG:  child process with pid: 4483 exits
with status 256
2018-03-15 02:31:10: pid 14557: LOG:  fork a new child process with pid:
11970
2018-03-15 02:31:24: pid 15780: LOG:  forked new pcp worker, pid=12042
socket=8
2018-03-15 02:31:24: pid 14560: LOG:  new IPC connection received
2018-03-15 02:31:24: pid 15780: LOG:  PCP process with pid: 12042 exit with
SUCCESS.
2018-03-15 02:31:24: pid 15780: LOG:  PCP process with pid: 12042 exits
with status 0
2018-03-15 02:31:24: pid 14557: LOG:  child process with pid: 11738 exits
with status 256
2018-03-15 02:31:24: pid 14557: LOG:  fork a new child process with pid:
12050
2018-03-15 02:31:24: pid 10253: LOG:  new connection received
2018-03-15 02:31:34: pid 10253: LOG:  new connection received
2018-03-15 02:31:37: pid 2999: LOG:  new connection received
2018-03-15 02:32:24: pid 15780: LOG:  forked new pcp worker, pid=12244
socket=8
2018-03-15 02:32:24: pid 14560: LOG:  new IPC connection received
2018-03-15 02:32:24: pid 15780: LOG:  PCP process with pid: 12244 exit with
SUCCESS.
2018-03-15 02:32:24: pid 15780: LOG:  PCP process with pid: 12244 exits
with status 0
2018-03-15 02:32:24: pid 11970: LOG:  new connection received
2018-03-15 02:32:34: pid 14557: LOG:  child process with pid: 10253 exits
with status 256
2018-03-15 02:32:34: pid 14557: LOG:  fork a new child process with pid:
12289
2018-03-15 02:33:24: pid 15780: LOG:  forked new pcp worker, pid=12394
socket=8
2018-03-15 02:33:24: pid 14560: LOG:  new IPC connection received
2018-03-15 02:33:24: pid 15780: LOG:  PCP process with pid: 12394 exit with
SUCCESS.
2018-03-15 02:33:24: pid 15780: LOG:  PCP process with pid: 12394 exits
with status 0
2018-03-15 02:33:24: pid 14557: LOG:  child process with pid: 11970 exits
with status 256
2018-03-15 02:33:24: pid 14557: LOG:  fork a new child process with pid:
12404
2018-03-15 02:33:24: pid 12404: LOG:  new connection received
2018-03-15 02:34:25: pid 14557: LOG:  child process with pid: 12404 exits
with status 256
2018-03-15 02:34:25: pid 14557: LOG:  fork a new child process with pid:
12524
2018-03-15 02:34:25: pid 15780: LOG:  forked new pcp worker, pid=12545
socket=8
2018-03-15 02:34:25: pid 14560: LOG:  new IPC connection received
2018-03-15 02:34:25: pid 15780: LOG:  PCP process with pid: 12545 exit with
SUCCESS.
2018-03-15 02:34:25: pid 15780: LOG:  PCP process with pid: 12545 exits
with status 0
2018-03-15 02:34:25: pid 12524: LOG:  new connection received
*2018-03-15 02:35:17: pid 14560: LOG:  new watchdog node connection is
received from "105.25.30.230:42804 <http://105.25.30.230:42804>"*
*2018-03-15 02:35:17: pid 14560: LOG:  new node joined the cluster
hostname:"pgpool1" port:20110 pgpool_port:20010*
2018-03-15 02:35:17: pid 14560: LOG:  new outbond connection to
pgpool1:20110
*2018-03-15 02:35:18: pid 14560: LOG:  adding watchdog node "pgpool1:20010
Linux pgpool1" to the standby list*
2018-03-15 02:35:24: pid 15780: LOG:  forked new pcp worker, pid=12691
socket=8
2018-03-15 02:35:24: pid 14560: LOG:  new IPC connection received
2018-03-15 02:35:24: pid 15780: LOG:  PCP process with pid: 12691 exit with
SUCCESS.
***************************************************************************************************************


As we can see in the log:

*2018-03-15 02:28:16: pid 11487: LOG:  failed to acquire the delegate IP
address*
*2018-03-15 02:28:16: pid 11487: WARNING:  watchdog escalation failed to
acquire delegate IP*

The process fails when adquiring the delegate IP.
We have looked at the code (src/watchdog/wd_if.c) and we see the following:


131 if (rtn == WD_OK)
132     {
133         command = wd_get_cmd(pool_config->arping_cmd);
134         if (command)
135         {
136
            snprintf(path,sizeof(path),"%s/%s",pool_config->arping_path,command);
137             rtn = exec_if_cmd(path,pool_config->arping_cmd);
138             pfree(command);
139         }
140         else
141         {
142             rtn = WD_NG;
143             ereport(LOG,
144                 (errmsg("failed to acquire the delegate IP address"),
145                      errdetail("unable to parse the
arping_cmd:\"%s\"",pool_config->arping_cmd)));
146         }
147     }

According to the code, if it fails to acquire the IP, is arping no longer
launched?

The problem is that if arping is not launched again, arptables is never
updated, Is that correct?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180321/b4f44494/attachment-0001.html>


More information about the pgpool-general mailing list