[pgpool-general: 7738] Re: Fwd: watch_dog cluster down since "system has lost the network"

Bo Peng pengbo at sraoss.co.jp
Mon Oct 4 12:45:33 JST 2021


Hello,

Sorry for the late response.

> Hi,
> 
> I have two PG severs and three watch_dog nodes to setup a PG HA
> environment.
> 
>    - OS: Ubuntu 20.04
>    - PG version:12.8
>    - Pgpool version: 4.1.4
> 
> 
>    - PG -primary: 192.168.1.122
>    - PG -slave: 192.168.1.121
>    - Watch_dog node0: 192.168.1.122
>    - Watch_dog node1: 192.168.1.121
>    - Watch_dog node2: 192.168.1.101
> 
> 
> the HA environment works fine while after 3-4 hours, two watch_dog nodes
> downs, remaining only 1 watch_dog node (192.168.1.101) running.   the
> leader of watch_dog's log shows below error althought the network ip
> 192.168.1.122 is alive.
> 
> 2021-09-20 15:53:37: pid 1900172: WARNING:  network IP is removed and
> system has no IP is assigned
> 2021-09-20 15:53:37: pid 1900172: DETAIL:  changing the state to in network
> trouble
> 2021-09-20 15:53:37: pid 1900172: DEBUG:  removing all watchdog nodes from
> the standby list

I think it may be caused by a temporary network problem.
Does this issue occur every time? 

> 2021-09-20 15:53:37: pid 1900172: DETAIL:  standby list contains 1 nodes
> 2021-09-20 15:53:37: pid 1900172: DEBUG:  Removing all failover objects
> 2021-09-20 15:53:37: pid 1900172: LOG:  watchdog node state changed from
> [MASTER] to [IN NETWORK TROUBLE]
> 2021-09-20 15:53:37: pid 1900172: DEBUG:  STATE MACHINE INVOKED WITH EVENT
> = STATE CHANGED Current State = IN NETWORK TROUBLE
> 2021-09-20 15:53:37: pid 1900172: FATAL:  system has lost the network
> 2021-09-20 15:53:37: pid 1900172: LOG:  Watchdog is shutting down
> 2021-09-20 15:53:37: pid 1900172: DEBUG:  sending packet, watchdog node:[
> 192.168.1.101:9999 Linux dell-PowerEdge-R740] command id:[1113]
> type:[INFORM I AM GOING DOWN] state:[IN NETWORK TROUBLE]
> 2021-09-20 15:53:37: pid 1900172: DEBUG:  sending watchdog packet to
> socket:8, type:[X], command ID:1113, data Length:0
> 2021-09-20 15:53:37: pid 1933141: LOG:  watchdog: de-escalation started
> 2021-09-20 15:53:37: pid 1933141: DEBUG:  watchdog exec interface up/down
> command: '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev ens2f0' succeeded
> 2021-09-20 15:53:37: pid 1933141: LOG:  successfully released the delegate
> IP:"192.168.1.129"
> 2021-09-20 15:53:37: pid 1933141: DETAIL:  'if_down_cmd' returned with
> success
> 2021-09-20 15:53:37: pid 1900168: DEBUG:  reaper handler
> 2021-09-20 15:53:37: pid 1900168: DEBUG:  watchdog child process with pid:
> 1900172 exit with FATAL ERROR. pgpool-II will be shutdown
> 2021-09-20 15:53:37: pid 1900168: LOG:  watchdog child process with pid:
> 1900172 exits with status 768
> 2021-09-20 15:53:37: pid 1900168: FATAL:  watchdog child process exit with
> fatal error. exiting pgpool-II
> 2021-09-20 15:53:37: pid 1933148: LOG:  setting the local watchdog node
> name to "192.168.1.122:9999 Linux dell-PowerEdge-R740"
> 2021-09-20 15:53:37: pid 1933148: LOG:  watchdog cluster is configured with
> 2 remote nodes
> 2021-09-20 15:53:37: pid 1933148: LOG:  watchdog remote node:0 on
> 192.168.1.121:9000
> 2021-09-20 15:53:37: pid 1933148: LOG:  watchdog remote node:1 on
> 192.168.1.101:9000
> 2021-09-20 15:53:37: pid 1933148: LOG:  interface monitoring is disabled in
> watchdog
> 2021-09-20 15:53:37: pid 1933148: INFO:  IPC socket path:
> "/tmp/.s.PGPOOLWD_CMD.9000"
> 2021-09-20 15:53:37: pid 1933148: LOG:  watchdog node state changed from
> [DEAD] to [LOADING]
> 2021-09-20 15:53:37: pid 1933148: DEBUG:  STATE MACHINE INVOKED WITH EVENT
> = STATE CHANGED Current State = LOADING
> 2021-09-20 15:53:37: pid 1933148: DEBUG:  error in outbound connection to
> 192.168.1.121:9000
> 2021-09-20 15:53:37: pid 1933148: DETAIL:  Connection refused
> 2021-09-20 15:53:37: pid 1933148: LOG:  new outbound connection to
> 192.168.1.101:9000
> 2021-09-20 15:53:37: pid 1900189: DEBUG:  lifecheck child receives shutdown
> request signal 2, forwarding to all children
> 2021-09-20 15:53:37: pid 1900189: DEBUG:  lifecheck child receives fast
> shutdown request
> 2021-09-20 15:53:37: pid 1933148: LOG:  Watchdog is shutting down
> 
> Please refer the pgpool.conf and running log on each server.  Any  advice
> to fix it?


-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan
http://www.sraoss.co.jp/


More information about the pgpool-general mailing list