[pgpool-general: 7710] Fwd: watch_dog cluster down since "system has lost the network"

Forest Lin zhijia.lin at gmail.com
Mon Sep 20 18:15:44 JST 2021


Hi,

I have two PG severs and three watch_dog nodes to setup a PG HA
environment.

   - OS: Ubuntu 20.04
   - PG version:12.8
   - Pgpool version: 4.1.4


   - PG -primary: 192.168.1.122
   - PG -slave: 192.168.1.121
   - Watch_dog node0: 192.168.1.122
   - Watch_dog node1: 192.168.1.121
   - Watch_dog node2: 192.168.1.101


the HA environment works fine while after 3-4 hours, two watch_dog nodes
downs, remaining only 1 watch_dog node (192.168.1.101) running.   the
leader of watch_dog's log shows below error althought the network ip
192.168.1.122 is alive.

2021-09-20 15:53:37: pid 1900172: WARNING:  network IP is removed and
system has no IP is assigned
2021-09-20 15:53:37: pid 1900172: DETAIL:  changing the state to in network
trouble
2021-09-20 15:53:37: pid 1900172: DEBUG:  removing all watchdog nodes from
the standby list
2021-09-20 15:53:37: pid 1900172: DETAIL:  standby list contains 1 nodes
2021-09-20 15:53:37: pid 1900172: DEBUG:  Removing all failover objects
2021-09-20 15:53:37: pid 1900172: LOG:  watchdog node state changed from
[MASTER] to [IN NETWORK TROUBLE]
2021-09-20 15:53:37: pid 1900172: DEBUG:  STATE MACHINE INVOKED WITH EVENT
= STATE CHANGED Current State = IN NETWORK TROUBLE
2021-09-20 15:53:37: pid 1900172: FATAL:  system has lost the network
2021-09-20 15:53:37: pid 1900172: LOG:  Watchdog is shutting down
2021-09-20 15:53:37: pid 1900172: DEBUG:  sending packet, watchdog node:[
192.168.1.101:9999 Linux dell-PowerEdge-R740] command id:[1113]
type:[INFORM I AM GOING DOWN] state:[IN NETWORK TROUBLE]
2021-09-20 15:53:37: pid 1900172: DEBUG:  sending watchdog packet to
socket:8, type:[X], command ID:1113, data Length:0
2021-09-20 15:53:37: pid 1933141: LOG:  watchdog: de-escalation started
2021-09-20 15:53:37: pid 1933141: DEBUG:  watchdog exec interface up/down
command: '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev ens2f0' succeeded
2021-09-20 15:53:37: pid 1933141: LOG:  successfully released the delegate
IP:"192.168.1.129"
2021-09-20 15:53:37: pid 1933141: DETAIL:  'if_down_cmd' returned with
success
2021-09-20 15:53:37: pid 1900168: DEBUG:  reaper handler
2021-09-20 15:53:37: pid 1900168: DEBUG:  watchdog child process with pid:
1900172 exit with FATAL ERROR. pgpool-II will be shutdown
2021-09-20 15:53:37: pid 1900168: LOG:  watchdog child process with pid:
1900172 exits with status 768
2021-09-20 15:53:37: pid 1900168: FATAL:  watchdog child process exit with
fatal error. exiting pgpool-II
2021-09-20 15:53:37: pid 1933148: LOG:  setting the local watchdog node
name to "192.168.1.122:9999 Linux dell-PowerEdge-R740"
2021-09-20 15:53:37: pid 1933148: LOG:  watchdog cluster is configured with
2 remote nodes
2021-09-20 15:53:37: pid 1933148: LOG:  watchdog remote node:0 on
192.168.1.121:9000
2021-09-20 15:53:37: pid 1933148: LOG:  watchdog remote node:1 on
192.168.1.101:9000
2021-09-20 15:53:37: pid 1933148: LOG:  interface monitoring is disabled in
watchdog
2021-09-20 15:53:37: pid 1933148: INFO:  IPC socket path:
"/tmp/.s.PGPOOLWD_CMD.9000"
2021-09-20 15:53:37: pid 1933148: LOG:  watchdog node state changed from
[DEAD] to [LOADING]
2021-09-20 15:53:37: pid 1933148: DEBUG:  STATE MACHINE INVOKED WITH EVENT
= STATE CHANGED Current State = LOADING
2021-09-20 15:53:37: pid 1933148: DEBUG:  error in outbound connection to
192.168.1.121:9000
2021-09-20 15:53:37: pid 1933148: DETAIL:  Connection refused
2021-09-20 15:53:37: pid 1933148: LOG:  new outbound connection to
192.168.1.101:9000
2021-09-20 15:53:37: pid 1900189: DEBUG:  lifecheck child receives shutdown
request signal 2, forwarding to all children
2021-09-20 15:53:37: pid 1900189: DEBUG:  lifecheck child receives fast
shutdown request
2021-09-20 15:53:37: pid 1933148: LOG:  Watchdog is shutting down

Please refer the pgpool.conf and running log on each server.  Any  advice
to fix it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210920/5d7e5650/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-101.conf
Type: application/octet-stream
Size: 44191 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210920/5d7e5650/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-121.conf
Type: application/octet-stream
Size: 44243 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210920/5d7e5650/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-122.conf
Type: application/octet-stream
Size: 44234 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210920/5d7e5650/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-101.zip
Type: application/zip
Size: 48913 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210920/5d7e5650/attachment-0003.zip>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-121.zip
Type: application/zip
Size: 24922 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210920/5d7e5650/attachment-0004.zip>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-122.zip
Type: application/zip
Size: 203812 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210920/5d7e5650/attachment-0005.zip>


More information about the pgpool-general mailing list