<div dir="ltr"><div class="gmail_quote"><br><br><div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="margin:0">Hi,  </div><div style="margin:0"><br></div><div style="margin:0">I have two PG severs and three watch_dog nodes to setup a PG HA environment.   </div><div style="margin:0"><ul><li>OS: Ubuntu 20.04</li><li>PG version:12.8</li><li>Pgpool version: 4.1.4</li></ul></div><div style="margin:0"><ul><li>PG -primary: 192.168.1.122</li><li>PG -slave: 192.168.1.121</li><li>Watch_dog node0: 192.168.1.122</li><li>Watch_dog node1: 192.168.1.121</li><li>Watch_dog node2: 192.168.1.101</li></ul></div><div style="margin:0"><br></div><div style="margin:0">the HA environment works fine while after 3-4 hours, two watch_dog nodes downs, remaining only 1 watch_dog node (192.168.1.101) running.   the leader of watch_dog's log shows below error althought the network ip 192.168.1.122 is alive.</div><div style="margin:0"><br></div><div style="margin:0"><div style="margin:0">2021-09-20 15:53:37: pid 1900172: WARNING:  network IP is removed and system has no IP is assigned</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: DETAIL:  changing the state to in network trouble</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: DEBUG:  removing all watchdog nodes from the standby list</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: DETAIL:  standby list contains 1 nodes</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: DEBUG:  Removing all failover objects</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: LOG:  watchdog node state changed from [MASTER] to [IN NETWORK TROUBLE]</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: DEBUG:  STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = IN NETWORK TROUBLE</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: FATAL:  system has lost the network</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: LOG:  Watchdog is shutting down</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: DEBUG:  sending packet, watchdog node:[<a href="http://192.168.1.101:9999" target="_blank">192.168.1.101:9999</a> Linux dell-PowerEdge-R740] command id:[1113] type:[INFORM I AM GOING DOWN] state:[IN NETWORK TROUBLE]</div><div style="margin:0">2021-09-20 15:53:37: pid 1900172: DEBUG:  sending watchdog packet to socket:8, type:[X], command ID:1113, data Length:0</div><div style="margin:0">2021-09-20 15:53:37: pid 1933141: LOG:  watchdog: de-escalation started</div><div style="margin:0">2021-09-20 15:53:37: pid 1933141: DEBUG:  watchdog exec interface up/down command: '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev ens2f0' succeeded</div><div style="margin:0">2021-09-20 15:53:37: pid 1933141: LOG:  successfully released the delegate IP:"192.168.1.129"</div><div style="margin:0">2021-09-20 15:53:37: pid 1933141: DETAIL:  'if_down_cmd' returned with success</div><div style="margin:0">2021-09-20 15:53:37: pid 1900168: DEBUG:  reaper handler</div><div style="margin:0">2021-09-20 15:53:37: pid 1900168: DEBUG:  watchdog child process with pid: 1900172 exit with FATAL ERROR. pgpool-II will be shutdown</div><div style="margin:0">2021-09-20 15:53:37: pid 1900168: LOG:  watchdog child process with pid: 1900172 exits with status 768</div><div style="margin:0">2021-09-20 15:53:37: pid 1900168: FATAL:  watchdog child process exit with fatal error. exiting pgpool-II</div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: LOG:  setting the local watchdog node name to "<a href="http://192.168.1.122:9999" target="_blank">192.168.1.122:9999</a> Linux dell-PowerEdge-R740"</div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: LOG:  watchdog cluster is configured with 2 remote nodes</div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: LOG:  watchdog remote node:0 on <a href="http://192.168.1.121:9000" target="_blank">192.168.1.121:9000</a></div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: LOG:  watchdog remote node:1 on <a href="http://192.168.1.101:9000" target="_blank">192.168.1.101:9000</a></div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: LOG:  interface monitoring is disabled in watchdog</div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: INFO:  IPC socket path: "/tmp/.s.PGPOOLWD_CMD.9000"</div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: LOG:  watchdog node state changed from [DEAD] to [LOADING]</div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: DEBUG:  STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = LOADING</div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: DEBUG:  error in outbound connection to <a href="http://192.168.1.121:9000" target="_blank">192.168.1.121:9000</a></div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: DETAIL:  Connection refused</div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: LOG:  new outbound connection to <a href="http://192.168.1.101:9000" target="_blank">192.168.1.101:9000</a></div><div style="margin:0">2021-09-20 15:53:37: pid 1900189: DEBUG:  lifecheck child receives shutdown request signal 2, forwarding to all children</div><div style="margin:0">2021-09-20 15:53:37: pid 1900189: DEBUG:  lifecheck child receives fast shutdown request</div><div style="margin:0">2021-09-20 15:53:37: pid 1933148: LOG:  Watchdog is shutting down</div><div><br></div></div>Please refer the pgpool.conf and running log on each server.  Any  advice to fix it?</div><br><br><span title="neteasefooter"><p> </p></span></div><br><br><span title="neteasefooter"><p> </p></span></div><br><br><span title="neteasefooter"><p> </p></span></div></div>