[pgpool-general: 8743] [pgpool 4.0.4 ] PCP + psql commands get stuck after systemd-networkd restart

Gopikrishnan nksgopikrishnan at gmail.com
Tue Apr 18 13:05:10 JST 2023


When I do:
systemctl restart systemd-networkd

After that, I am not able to execute any PCP commands like:
pcp_watchdog_info
It is frozen.

I tried restarting pgpool and postgres to no avail.
However, rebooting the system gets it back to a workable state. (PCP
commands are running again and I can attach the nodes back to the pool)

The pgPool logs shows that the pg-pool was shutdown due to the network
event:
-------------------------------

2023-04-17T15:27:25.190949+00:00 vmvrlcm-104-32 g[3042]: [268-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  network event received
2023-04-17T15:27:25.191041+00:00 vmvrlcm-104-32 g[3042]: [268-2] 2023-04-17
15:27:25: pid 3042: DETAIL:  deleted = YES Link change event = NO
2023-04-17T15:27:25.191186+00:00 vmvrlcm-104-32 g[3042]: [269-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  STATE MACHINE INVOKED WITH EVENT = NETWORK IP
IS REMOVED Current State = STANDBY
2023-04-17T15:27:25.191243+00:00 vmvrlcm-104-32 g[3042]: [270-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  network interface lo having flags 65609
2023-04-17T15:27:25.191296+00:00 vmvrlcm-104-32 g[3042]: [271-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  network interface eth0 having flags 69699
2023-04-17T15:27:25.191352+00:00 vmvrlcm-104-32 g[3042]: [272-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  network interface "eth0" link is active
2023-04-17T15:27:25.191401+00:00 vmvrlcm-104-32 g[3042]: [273-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  network interface "eth0" link is up
2023-04-17T15:27:25.191449+00:00 vmvrlcm-104-32 g[3042]: [274-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  network interface lo having flags 65609
2023-04-17T15:27:25.191497+00:00 vmvrlcm-104-32 g[3042]: [275-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  network interface "eth0" is up and we can
continue
2023-04-17T15:27:25.191551+00:00 vmvrlcm-104-32 g[3042]: [276-1] 2023-04-17
15:27:25: pid 3042: WARNING:  network IP is removed and system has no IP is
assigned
2023-04-17T15:27:25.191614+00:00 vmvrlcm-104-32 g[3042]: [276-2] 2023-04-17
15:27:25: pid 3042: DETAIL:  changing the state to in network trouble
2023-04-17T15:27:25.191667+00:00 vmvrlcm-104-32 g[3042]: [277-1] 2023-04-17
15:27:25: pid 3042: LOG:  watchdog node state changed from [STANDBY] to [IN
NETWORK TROUBLE]
2023-04-17T15:27:25.191713+00:00 vmvrlcm-104-32 g[3042]: [278-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  STATE MACHINE INVOKED WITH EVENT = STATE
CHANGED Current State = IN NETWORK TROUBLE
2023-04-17T15:27:25.191759+00:00 vmvrlcm-104-32 g[3042]: [279-1] 2023-04-17
15:27:25: pid 3042: FATAL:  system has lost the network
2023-04-17T15:27:25.191807+00:00 vmvrlcm-104-32 g[3042]: [280-1] 2023-04-17
15:27:25: pid 3042: LOG:  Watchdog is shutting down
2023-04-17T15:27:25.191849+00:00 vmvrlcm-104-32 g[3042]: [281-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  sending packet, watchdog node:[
vmvrlcm-104-31.eng.vmware.com:9999 Linux vmvrlcm-104-31.eng.vmware.com]
command id:[10] type:[INFORM I AM GOING DOWN] state:[IN NETWORK TROUBLE]
2023-04-17T15:27:25.191894+00:00 vmvrlcm-104-32 g[3042]: [282-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  sending watchdog packet to socket:8, type:[X],
command ID:10, data Length:0
2023-04-17T15:27:25.191952+00:00 vmvrlcm-104-32 g[3042]: [283-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  sending packet, watchdog node:[
vmvrlcm-104-33.eng.vmware.com:9999 Linux vmvrlcm-104-33.eng.vmware.com]
command id:[10] type:[INFORM I AM GOING DOWN] state:[IN NETWORK TROUBLE]
2023-04-17T15:27:25.192001+00:00 vmvrlcm-104-32 g[3042]: [284-1] 2023-04-17
15:27:25: pid 3042: DEBUG:  sending watchdog packet to socket:9, type:[X],
command ID:10, data Length:0
2023-04-17T15:27:25.192671+00:00 vmvrlcm-104-32 pgpool[3040]: [24-1]
2023-04-17 15:27:25: pid 3040: DEBUG:  reaper handler
2023-04-17T15:27:25.192753+00:00 vmvrlcm-104-32 pgpool[3040]: [25-1]
2023-04-17 15:27:25: pid 3040: DEBUG:  watchdog child process with pid:
3042 exit with FATAL ERROR. pgpool-II will be shutdown
2023-04-17T15:27:25.192803+00:00 vmvrlcm-104-32 pgpool[3040]: [26-1]
2023-04-17 15:27:25: pid 3040: LOG:  watchdog child process with pid: 3042
exits with status 768
2023-04-17T15:27:25.192864+00:00 vmvrlcm-104-32 pgpool[3040]: [27-1]
2023-04-17 15:27:25: pid 3040: FATAL:  watchdog child process exit with
fatal error. exiting pgpool-II
2023-04-17T15:27:25.197530+00:00 vmvrlcm-104-32 ck[3157]: [23-1] 2023-04-17
15:27:25: pid 3157: DEBUG:  lifecheck child receives shutdown request
signal 2, forwarding to all children
2023-04-17T15:27:25.197611+00:00 vmvrlcm-104-32 ck[3157]: [24-1] 2023-04-17
15:27:25: pid 3157: DEBUG:  lifecheck child receives fast shutdown request
2023-04-17T15:27:25.197658+00:00 vmvrlcm-104-32 at sender[3159]: [148-1]
2023-04-17 15:27:25: pid 3159: DEBUG:  watchdog heartbeat sender child
receives shutdown request signal 2
2023-04-17T15:27:25.197994+00:00 vmvrlcm-104-32 at sender[3163]: [148-1]
2023-04-17 15:27:25: pid 3163: DEBUG:  watchdog heartbeat sender child
receives shutdown request signal 2
2023-04-17T15:27:25.199168+00:00 vmvrlcm-104-32 at receiver[3161]: [18-1]
2023-04-17 15:27:25: pid 3161: DEBUG:  watchdog heartbeat receiver child
receives shutdown request signal 2
2023-04-17T15:27:25.199567+00:00 vmvrlcm-104-32 at receiver[3158]: [18-1]
2023-04-17 15:27:25: pid 3158: DEBUG:  watchdog heartbeat receiver child
receives shutdown request signal 2
2023-04-17T15:27:25.448554+00:00 vmvrlcm-104-32 check process(2)[3197]:
[386-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  health check: clearing alarm
2023-04-17T15:27:25.448689+00:00 vmvrlcm-104-32 check process(2)[3197]:
[387-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  SSL is requested but SSL
support is not available
2023-04-17T15:27:25.450621+00:00 vmvrlcm-104-32 check process(2)[3197]:
[388-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  authenticate kind = 5
2023-04-17T15:27:25.451892+00:00 vmvrlcm-104-32 check process(2)[3197]:
[389-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  authenticate backend: key
data received
2023-04-17T15:27:25.451987+00:00 vmvrlcm-104-32 check process(2)[3197]:
[390-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  authenticate backend:
transaction state: I
2023-04-17T15:27:25.452043+00:00 vmvrlcm-104-32 check process(2)[3197]:
[391-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  health check: clearing alarm
2023-04-17T15:27:25.452096+00:00 vmvrlcm-104-32 check process(2)[3197]:
[392-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  health check: clearing alarm
2023-04-17T15:27:25.455020+00:00 vmvrlcm-104-32 check process(0)[3196]:
[386-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  health check: clearing alarm
2023-04-17T15:27:25.455096+00:00 vmvrlcm-104-32 check process(0)[3196]:
[387-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  SSL is requested but SSL
support is not available
2023-04-17T15:27:25.457196+00:00 vmvrlcm-104-32 check process(0)[3196]:
[388-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  authenticate kind = 5
2023-04-17T15:27:25.458437+00:00 vmvrlcm-104-32 check process(0)[3196]:
[389-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  authenticate backend: key
data received
2023-04-17T15:27:25.458556+00:00 vmvrlcm-104-32 check process(0)[3196]:
[390-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  authenticate backend:
transaction state: I
2023-04-17T15:27:25.458674+00:00 vmvrlcm-104-32 check process(0)[3196]:
[391-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  health check: clearing alarm
2023-04-17T15:27:25.458742+00:00 vmvrlcm-104-32 check process(0)[3196]:
[392-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  health check: clearing alarm
2023-04-17T15:27:30.452427+00:00 vmvrlcm-104-32 check process(2)[3197]:
[393-1] 2023-04-17 15:27:30: pid 3197: DEBUG:  health check: clearing alarm
2023-04-17T15:27:30.454041+00:00 vmvrlcm-104-32 check process(2)[3197]:
[394-1] 2023-04-17 15:27:30: pid 3197: DEBUG:  SSL is requested but SSL
support is not available

------------------

After this, it is in kind of a loop of the 'clearing alarm' + 'SSL support
is not available'

The relevant (In my current understanding) watchdog settings are:
----------------------------------------------------------------------------------------
wd_hostname = '10.108.104.31'
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = '10.108.104.32'
heartbeat_device0 = ''
heartbeat_destination1 = '10.108.104.33'
heartbeat_device1 = ''
wd_monitoring_interfaces_list = 'any'

*Thanks & Regards*
*Gopi*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20230418/c46f21b7/attachment.htm>


More information about the pgpool-general mailing list