[pgpool-general: 7425] Problems with watchdog

Francesco Mazzi fmazzi at comune.genova.it
Mon Mar 1 21:50:22 JST 2021


Hello, I'm testing watchdog with 2 instances of pgpool 4.2.2 , package pgpool-II-pg12 from pgpool repository, centos 7. 
I have problems in starting pgpool, in both nodes, even if I start only one, it starts and it becomes leader but then it dies. I see some errors in log, here is the complete log of one node. 

systemd: Started Pgpool-II. 
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: health_check_stats_shared_memory_size: requested size: 12288 
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: memory cache initialized 
pgpool: 2021-03-01 11:54:14: pid 1868: DETAIL: memcache blocks :64 
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: allocating (136555376) bytes of shared memory segment 
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: allocating shared memory segment of size: 136555376 
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: health_check_stats_shared_memory_size: requested size: 12288 
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: health_check_stats_shared_memory_size: requested size: 12288 
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: memory cache initialized 
pgpool: 2021-03-01 11:54:14: pid 1868: DETAIL: memcache blocks :64 
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: pool_discard_oid_maps: discarded memqcache oid maps 
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: waiting for watchdog to initialize 
pgpool: 2021-03-01 11:54:14: pid 1873: LOG: setting the local watchdog node name to "vm-lxpgproxy1:5432 Linux vm-lxpgproxy1" 
pgpool: 2021-03-01 11:54:14: pid 1873: LOG: watchdog cluster is configured with 1 remote nodes 
pgpool: 2021-03-01 11:54:14: pid 1873: LOG: watchdog remote node:0 on vm-lxpgproxy2:9000 
pgpool: 2021-03-01 11:54:14: pid 1873: LOG: watchdog node state changed from [DEAD] to [LOADING] 
pgpool: 2021-03-01 11:54:19: pid 1873: LOG: watchdog node state changed from [LOADING] to [JOINING] 
pgpool: 2021-03-01 11:54:23: pid 1873: LOG: watchdog node state changed from [JOINING] to [INITIALIZING] 
pgpool: 2021-03-01 11:54:24: pid 1873: LOG: I am the only alive node in the watchdog cluster 
pgpool: 2021-03-01 11:54:24: pid 1873: HINT: skipping stand for coordinator state 
pgpool: 2021-03-01 11:54:24: pid 1873: LOG: watchdog node state changed from [INITIALIZING] to [LEADER] 
pgpool: 2021-03-01 11:54:24: pid 1873: LOG: I am announcing my self as leader/coordinator watchdog node 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: I am the cluster leader node 
pgpool: 2021-03-01 11:54:28: pid 1873: DETAIL: our declare coordinator message is accepted by all nodes 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: setting the local node "vm-lxpgproxy1:5432 Linux vm-lxpgproxy1" as watchdog cluste 
r leader 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: I am the cluster leader node. Starting escalation process 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: escalation process started with PID:1888 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: watchdog process is initialized 
pgpool: 2021-03-01 11:54:28: pid 1868: DETAIL: watchdog messaging data version: 1.2 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: new IPC connection received 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: authentication failed 
pgpool: 2021-03-01 11:54:28: pid 1873: DETAIL: IPC command contains an invalid data 
pgpool: 2021-03-01 11:54:28: pid 1868: WARNING: get runtime variable value from watchdog failed 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: failed to get current state of local watchdog node 
pgpool: 2021-03-01 11:54:28: pid 1868: DETAIL: get runtime variable value from watchdog returned no data 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: Setting up socket for 0.0.0.0:5432 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: Setting up socket for :::5432 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: find_primary_node_repeatedly: waiting for finding a primary node 
pgpool: 2021-03-01 11:54:28: pid 1888: LOG: watchdog: escalation started 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: new IPC connection received 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: authentication failed 
pgpool: 2021-03-01 11:54:28: pid 1873: DETAIL: IPC command contains an invalid data 
pgpool: 2021-03-01 11:54:28: pid 1889: FATAL: get node list command reply contains no data 
pgpool: + PGPOOLS=(vm-lxpgproxy2) 
pgpool: + VIP=192.168.153.230 
pgpool: + DEVICE=ens32 
pgpool: + for pgpool in '"${PGPOOLS[@]}"' 
pgpool: + '[' vm-lxpgproxy1 = vm-lxpgproxy2 ']' 
pgpool: + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres at vm-lxpgproxy2 -i /var/lib/pgsql/.ssh/id_rsa ' 
pgpool: /usr/bin/sudo /sbin/ip addr del 192.168.153.230/22 dev ens32 
pgpool: ' 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: find_primary_node: standby node is 0 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: find_primary_node: primary node is 1 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: find_primary_node: standby node is 2 
pgpool: 2021-03-01 11:54:28: pid 1928: LOG: process started 
pgpool: 2021-03-01 11:54:28: pid 1925: LOG: process started 
pgpool: 2021-03-01 11:54:28: pid 1926: LOG: process started 
pgpool: 2021-03-01 11:54:28: pid 1927: LOG: process started 
pgpool: 2021-03-01 11:54:28: pid 1924: LOG: PCP process: 1924 started 
pgpool: Warning: Permanently added 'vm-lxpgproxy2,192.168.153.237' (ECDSA) to the list of known hosts. 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: pgpool-II successfully started. version 4.2.2 (chichiriboshi) 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: node status[0]: 2 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: node status[1]: 1 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: node status[2]: 2 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: watchdog lifecheck process with pid: 1889 exits with status 768 
pgpool: 2021-03-01 11:54:28: pid 1868: FATAL: watchdog lifecheck process exit with fatal error. exiting pgpool-II 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: new IPC connection received 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: authentication failed 
pgpool: 2021-03-01 11:54:28: pid 1873: DETAIL: IPC command contains an invalid data 
pgpool: 2021-03-01 11:54:28: pid 1929: ERROR: get node list command reply contains no data 
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: shutting down 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: Watchdog is shutting down 
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation 
pgpool: RTNETLINK answers: Cannot assign requested address 
pgpool: + exit 0 
pgpool: 2021-03-01 11:54:28: pid 1888: LOG: watchdog escalation successful 
pgpool: 2021-03-01 11:54:29: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation 
pgpool: 2021-03-01 11:54:30: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation 
pgpool: 2021-03-01 11:54:31: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation 
pgpool: 2021-03-01 11:54:32: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation 
pgpool: 2021-03-01 11:54:33: pid 1888: LOG: successfully acquired the delegate IP:"192.168.153.230" 
pgpool: 2021-03-01 11:54:33: pid 1888: DETAIL: 'if_up_cmd' returned with success 
pgpool: 2021-03-01 11:54:33: pid 1873: LOG: escalation process does not exited in time 
pgpool: 2021-03-01 11:54:33: pid 1873: DETAIL: starting the de-escalation anyway 
pgpool: 2021-03-01 11:54:33: pid 1942: LOG: watchdog: de-escalation started 
pgpool: 2021-03-01 11:54:33: pid 1942: LOG: successfully released the delegate IP:"192.168.153.230" 
pgpool: 2021-03-01 11:54:33: pid 1942: DETAIL: 'if_down_cmd' returned with success 
systemd: pgpool.service: main process exited, code=exited, status=1/FAILURE 
systemd: Unit pgpool.service entered failed state. 
systemd: pgpool.service failed. 

Any ideas? 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210301/6924bae7/attachment.htm>


More information about the pgpool-general mailing list