[pgpool-general: 8744] Re: [pgpool 4.0.4 ] PCP + psql commands get stuck after systemd-networkd restart

Bo Peng pengbo at sraoss.co.jp
Wed Apr 19 16:43:15 JST 2023


Hello,

> When I do:
> systemctl restart systemd-networkd
> 
> After that, I am not able to execute any PCP commands like:
> pcp_watchdog_info
> It is frozen.

If the network goes down, watchdog will detect the network failure and shutdown itself.
To avoid such problems, it is recommended to shutdown pgpool before restarting network.

BTW, which version of Pgpool-II are you using?

> I tried restarting pgpool and postgres to no avail.
> However, rebooting the system gets it back to a workable state. (PCP
> commands are running again and I can attach the nodes back to the pool)
> 
> The pgPool logs shows that the pg-pool was shutdown due to the network
> event:
> -------------------------------
> 
> 2023-04-17T15:27:25.190949+00:00 vmvrlcm-104-32 g[3042]: [268-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  network event received
> 2023-04-17T15:27:25.191041+00:00 vmvrlcm-104-32 g[3042]: [268-2] 2023-04-17
> 15:27:25: pid 3042: DETAIL:  deleted = YES Link change event = NO
> 2023-04-17T15:27:25.191186+00:00 vmvrlcm-104-32 g[3042]: [269-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  STATE MACHINE INVOKED WITH EVENT = NETWORK IP
> IS REMOVED Current State = STANDBY
> 2023-04-17T15:27:25.191243+00:00 vmvrlcm-104-32 g[3042]: [270-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  network interface lo having flags 65609
> 2023-04-17T15:27:25.191296+00:00 vmvrlcm-104-32 g[3042]: [271-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  network interface eth0 having flags 69699
> 2023-04-17T15:27:25.191352+00:00 vmvrlcm-104-32 g[3042]: [272-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  network interface "eth0" link is active
> 2023-04-17T15:27:25.191401+00:00 vmvrlcm-104-32 g[3042]: [273-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  network interface "eth0" link is up
> 2023-04-17T15:27:25.191449+00:00 vmvrlcm-104-32 g[3042]: [274-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  network interface lo having flags 65609
> 2023-04-17T15:27:25.191497+00:00 vmvrlcm-104-32 g[3042]: [275-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  network interface "eth0" is up and we can
> continue
> 2023-04-17T15:27:25.191551+00:00 vmvrlcm-104-32 g[3042]: [276-1] 2023-04-17
> 15:27:25: pid 3042: WARNING:  network IP is removed and system has no IP is
> assigned
> 2023-04-17T15:27:25.191614+00:00 vmvrlcm-104-32 g[3042]: [276-2] 2023-04-17
> 15:27:25: pid 3042: DETAIL:  changing the state to in network trouble
> 2023-04-17T15:27:25.191667+00:00 vmvrlcm-104-32 g[3042]: [277-1] 2023-04-17
> 15:27:25: pid 3042: LOG:  watchdog node state changed from [STANDBY] to [IN
> NETWORK TROUBLE]
> 2023-04-17T15:27:25.191713+00:00 vmvrlcm-104-32 g[3042]: [278-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  STATE MACHINE INVOKED WITH EVENT = STATE
> CHANGED Current State = IN NETWORK TROUBLE
> 2023-04-17T15:27:25.191759+00:00 vmvrlcm-104-32 g[3042]: [279-1] 2023-04-17
> 15:27:25: pid 3042: FATAL:  system has lost the network
> 2023-04-17T15:27:25.191807+00:00 vmvrlcm-104-32 g[3042]: [280-1] 2023-04-17
> 15:27:25: pid 3042: LOG:  Watchdog is shutting down
> 2023-04-17T15:27:25.191849+00:00 vmvrlcm-104-32 g[3042]: [281-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  sending packet, watchdog node:[
> vmvrlcm-104-31.eng.vmware.com:9999 Linux vmvrlcm-104-31.eng.vmware.com]
> command id:[10] type:[INFORM I AM GOING DOWN] state:[IN NETWORK TROUBLE]
> 2023-04-17T15:27:25.191894+00:00 vmvrlcm-104-32 g[3042]: [282-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  sending watchdog packet to socket:8, type:[X],
> command ID:10, data Length:0
> 2023-04-17T15:27:25.191952+00:00 vmvrlcm-104-32 g[3042]: [283-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  sending packet, watchdog node:[
> vmvrlcm-104-33.eng.vmware.com:9999 Linux vmvrlcm-104-33.eng.vmware.com]
> command id:[10] type:[INFORM I AM GOING DOWN] state:[IN NETWORK TROUBLE]
> 2023-04-17T15:27:25.192001+00:00 vmvrlcm-104-32 g[3042]: [284-1] 2023-04-17
> 15:27:25: pid 3042: DEBUG:  sending watchdog packet to socket:9, type:[X],
> command ID:10, data Length:0
> 2023-04-17T15:27:25.192671+00:00 vmvrlcm-104-32 pgpool[3040]: [24-1]
> 2023-04-17 15:27:25: pid 3040: DEBUG:  reaper handler
> 2023-04-17T15:27:25.192753+00:00 vmvrlcm-104-32 pgpool[3040]: [25-1]
> 2023-04-17 15:27:25: pid 3040: DEBUG:  watchdog child process with pid:
> 3042 exit with FATAL ERROR. pgpool-II will be shutdown
> 2023-04-17T15:27:25.192803+00:00 vmvrlcm-104-32 pgpool[3040]: [26-1]
> 2023-04-17 15:27:25: pid 3040: LOG:  watchdog child process with pid: 3042
> exits with status 768
> 2023-04-17T15:27:25.192864+00:00 vmvrlcm-104-32 pgpool[3040]: [27-1]
> 2023-04-17 15:27:25: pid 3040: FATAL:  watchdog child process exit with
> fatal error. exiting pgpool-II
> 2023-04-17T15:27:25.197530+00:00 vmvrlcm-104-32 ck[3157]: [23-1] 2023-04-17
> 15:27:25: pid 3157: DEBUG:  lifecheck child receives shutdown request
> signal 2, forwarding to all children
> 2023-04-17T15:27:25.197611+00:00 vmvrlcm-104-32 ck[3157]: [24-1] 2023-04-17
> 15:27:25: pid 3157: DEBUG:  lifecheck child receives fast shutdown request
> 2023-04-17T15:27:25.197658+00:00 vmvrlcm-104-32 at sender[3159]: [148-1]
> 2023-04-17 15:27:25: pid 3159: DEBUG:  watchdog heartbeat sender child
> receives shutdown request signal 2
> 2023-04-17T15:27:25.197994+00:00 vmvrlcm-104-32 at sender[3163]: [148-1]
> 2023-04-17 15:27:25: pid 3163: DEBUG:  watchdog heartbeat sender child
> receives shutdown request signal 2
> 2023-04-17T15:27:25.199168+00:00 vmvrlcm-104-32 at receiver[3161]: [18-1]
> 2023-04-17 15:27:25: pid 3161: DEBUG:  watchdog heartbeat receiver child
> receives shutdown request signal 2
> 2023-04-17T15:27:25.199567+00:00 vmvrlcm-104-32 at receiver[3158]: [18-1]
> 2023-04-17 15:27:25: pid 3158: DEBUG:  watchdog heartbeat receiver child
> receives shutdown request signal 2
> 2023-04-17T15:27:25.448554+00:00 vmvrlcm-104-32 check process(2)[3197]:
> [386-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  health check: clearing alarm
> 2023-04-17T15:27:25.448689+00:00 vmvrlcm-104-32 check process(2)[3197]:
> [387-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  SSL is requested but SSL
> support is not available
> 2023-04-17T15:27:25.450621+00:00 vmvrlcm-104-32 check process(2)[3197]:
> [388-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  authenticate kind = 5
> 2023-04-17T15:27:25.451892+00:00 vmvrlcm-104-32 check process(2)[3197]:
> [389-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  authenticate backend: key
> data received
> 2023-04-17T15:27:25.451987+00:00 vmvrlcm-104-32 check process(2)[3197]:
> [390-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  authenticate backend:
> transaction state: I
> 2023-04-17T15:27:25.452043+00:00 vmvrlcm-104-32 check process(2)[3197]:
> [391-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  health check: clearing alarm
> 2023-04-17T15:27:25.452096+00:00 vmvrlcm-104-32 check process(2)[3197]:
> [392-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  health check: clearing alarm
> 2023-04-17T15:27:25.455020+00:00 vmvrlcm-104-32 check process(0)[3196]:
> [386-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  health check: clearing alarm
> 2023-04-17T15:27:25.455096+00:00 vmvrlcm-104-32 check process(0)[3196]:
> [387-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  SSL is requested but SSL
> support is not available
> 2023-04-17T15:27:25.457196+00:00 vmvrlcm-104-32 check process(0)[3196]:
> [388-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  authenticate kind = 5
> 2023-04-17T15:27:25.458437+00:00 vmvrlcm-104-32 check process(0)[3196]:
> [389-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  authenticate backend: key
> data received
> 2023-04-17T15:27:25.458556+00:00 vmvrlcm-104-32 check process(0)[3196]:
> [390-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  authenticate backend:
> transaction state: I
> 2023-04-17T15:27:25.458674+00:00 vmvrlcm-104-32 check process(0)[3196]:
> [391-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  health check: clearing alarm
> 2023-04-17T15:27:25.458742+00:00 vmvrlcm-104-32 check process(0)[3196]:
> [392-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  health check: clearing alarm
> 2023-04-17T15:27:30.452427+00:00 vmvrlcm-104-32 check process(2)[3197]:
> [393-1] 2023-04-17 15:27:30: pid 3197: DEBUG:  health check: clearing alarm
> 2023-04-17T15:27:30.454041+00:00 vmvrlcm-104-32 check process(2)[3197]:
> [394-1] 2023-04-17 15:27:30: pid 3197: DEBUG:  SSL is requested but SSL
> support is not available
> 
> ------------------
> 
> After this, it is in kind of a loop of the 'clearing alarm' + 'SSL support
> is not available'
> 
> The relevant (In my current understanding) watchdog settings are:
> ----------------------------------------------------------------------------------------
> wd_hostname = '10.108.104.31'
> wd_lifecheck_method = 'heartbeat'
> wd_interval = 10
> wd_heartbeat_keepalive = 2
> wd_heartbeat_deadtime = 30
> heartbeat_destination0 = '10.108.104.32'
> heartbeat_device0 = ''
> heartbeat_destination1 = '10.108.104.33'
> heartbeat_device1 = ''
> wd_monitoring_interfaces_list = 'any'

Above logs are DEBUG messages and I don't think they caused this issue.
Do these DEBUG messages only appear when you restart the network?

If you are using watchdog, you also need to configure the following parameters:

heartbeat_destination_port0
heartbeat_destination_port1
other_pgpool_hostname0
other_pgpool_port0
other_pgpool_hostname1
other_pgpool_port1

Regards,

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS LLC
https://www.sraoss.co.jp/


More information about the pgpool-general mailing list