<div dir="ltr">Hi all,<div><br></div><div>In our tests we are seeing sporadic failures when services on one node are restarted. These tests run in a 3-node setup all nodes running a database and pgpool, which node 1 being the one restarting its services, node 2 being both pgpool leader and primary database and node 3 running a standby database and pgpool. Looking at the logs (see below), it seems node 1 is not allowed to connect to node 2 because node 2 has marked node 1 as dead via the lifecheck. However, node 1 will not become alive until it has joined the cluster. The end result is that node 1 keeps trying to join the cluster and node 2 keeps rejecting it.</div><div><br></div><div>In the logs below, what I call node 1 is <span style="color:rgb(0,0,0);white-space:pre-wrap">172.29.30.1, node 2 is </span><span style="color:rgb(0,0,0);white-space:pre-wrap">172.29.30.2 and node 3 is </span><span style="color:rgb(0,0,0);white-space:pre-wrap">172.29.30.3. I think the problem lies in the 7th line of the node 2 logs: "</span>node id :0 status = "NODE DEAD" message:"No heartbeat signal from node"". Node 2 never lets node 1 recover from this status. Note that the node 2 logs also show the automatic failback of the database on node 1.</div><div><br></div><div>Best regards,</div><div>Emond</div><div><span style="color:rgb(0,0,0);white-space:pre-wrap"><br></span></div><div><span style="color:rgb(0,0,0);white-space:pre-wrap">The logs for node 1:</span></div>2021-11-24 23:44:44: pid 13: LOG:  watchdog cluster is configured with 2 remote nodes<br>2021-11-24 23:44:44: pid 13: LOG:  watchdog remote node:0 on <a href="http://172.29.30.2:9009">172.29.30.2:9009</a><br>2021-11-24 23:44:44: pid 13: LOG:  watchdog remote node:1 on <a href="http://172.29.30.3:9009">172.29.30.3:9009</a><br>2021-11-24 23:44:44: pid 13: LOG:  interface monitoring is disabled in watchdog<br>2021-11-24 23:44:44: pid 13: LOG:  watchdog node state changed from [DEAD] to [LOADING]<br><div>2021-11-24 23:44:44: pid 13: LOG:  new outbound connection to <a href="http://172.29.30.2:9009">172.29.30.2:9009</a><br>2021-11-24 23:44:44: pid 13: LOG:  setting the remote node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" as watchdog cluster leader<br>2021-11-24 23:44:44: pid 13: LOG:  watchdog node state changed from [LOADING] to [INITIALIZING]<br>2021-11-24 23:44:44: pid 13: LOG:  new watchdog node connection is received from "<a href="http://172.29.30.2:53967">172.29.30.2:53967</a><br>2021-11-24 23:44:44: pid 13: LOG:  new node joined the cluster hostname:"172.29.30.2" port:9009 pgpool_port:5432<br>2021-11-24 23:44:44: pid 13: DETAIL:  Pgpool-II version:"4.2.4" watchdog messaging version: 1.2<br>2021-11-24 23:44:44: pid 13: LOG:  new outbound connection to <a href="http://172.29.30.3:9009">172.29.30.3:9009</a><br>2021-11-24 23:44:44: pid 13: LOG:  new watchdog node connection is received from "<a href="http://172.29.30.3:57577">172.29.30.3:57577</a>"<br>2021-11-24 23:44:44: pid 13: LOG:  new node joined the cluster hostname:"172.29.30.3" port:9009 pgpool_port:5432<br>2021-11-24 23:44:44: pid 13: DETAIL:  Pgpool-II version:"4.2.4" watchdog messaging version: 1.2<br>2021-11-24 23:44:45: pid 13: LOG:  read from socket failed, remote end closed the connection<br>2021-11-24 23:44:45: pid 13: LOG:  client socket of <a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72 is closed<br>2021-11-24 23:44:45: pid 13: LOG:  remote node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" is reporting that it has lost us<br>2021-11-24 23:44:45: pid 13: LOG:  read from socket failed, remote end closed the connection<br>2021-11-24 23:44:45: pid 13: LOG:  outbound socket of <a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72 is closed<br>2021-11-24 23:44:45: pid 13: LOG:  remote node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" is not reachable<br>2021-11-24 23:44:45: pid 13: DETAIL:  marking the node as lost<br>2021-11-24 23:44:45: pid 13: LOG:  remote node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" is lost<br>2021-11-24 23:44:45: pid 13: LOG:  watchdog cluster has lost the coordinator node<br>2021-11-24 23:44:45: pid 13: LOG:  removing the remote node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" from watchdog cluster leader<br>2021-11-24 23:44:46: pid 13: LOG:  watchdog node state changed from [INITIALIZING] to [STANDING FOR LEADER]<br>2021-11-24 23:44:46: pid 13: LOG:  our stand for coordinator request is rejected by node "<a href="http://172.29.30.3:5432">172.29.30.3:5432</a> Linux 589ce3e63006"<br>2021-11-24 23:44:46: pid 13: DETAIL:  we might be in partial network isolation and cluster already have a valid leader<br>2021-11-24 23:44:46: pid 13: HINT:  please verify the watchdog life-check and network is working properly<br>2021-11-24 23:44:46: pid 13: LOG:  watchdog node state changed from [STANDING FOR LEADER] to [NETWORK ISOLATION]<br>2021-11-24 23:44:46: pid 13: LOG:  read from socket failed, remote end closed the connection<br>2021-11-24 23:44:46: pid 13: LOG:  client socket of <a href="http://172.29.30.3:5432">172.29.30.3:5432</a> Linux 589ce3e63006 is closed<br>2021-11-24 23:44:46: pid 13: LOG:  remote node "<a href="http://172.29.30.3:5432">172.29.30.3:5432</a> Linux 589ce3e63006" is reporting that it has lost us<br>2021-11-24 23:44:46: pid 13: LOG:  read from socket failed, remote end closed the connection<br>2021-11-24 23:44:46: pid 13: LOG:  outbound socket of <a href="http://172.29.30.3:5432">172.29.30.3:5432</a> Linux 589ce3e63006 is closed<br>2021-11-24 23:44:46: pid 13: LOG:  remote node "<a href="http://172.29.30.3:5432">172.29.30.3:5432</a> Linux 589ce3e63006" is not reachable<br>2021-11-24 23:44:46: pid 13: DETAIL:  marking the node as lost<br>2021-11-24 23:44:46: pid 13: LOG:  remote node "<a href="http://172.29.30.3:5432">172.29.30.3:5432</a> Linux 589ce3e63006" is lost<br>2021-11-24 23:44:56: pid 13: LOG:  trying again to join the cluster<br>2021-11-24 23:44:56: pid 13: LOG:  watchdog node state changed from [NETWORK ISOLATION] to [JOINING]<br>2021-11-24 23:44:56: pid 13: LOG:  new outbound connection to <a href="http://172.29.30.2:9009">172.29.30.2:9009</a> <br>2021-11-24 23:44:56: pid 13: LOG:  new outbound connection to <a href="http://172.29.30.3:9009">172.29.30.3:9009</a> <br>2021-11-24 23:44:56: pid 13: LOG:  new watchdog node connection is received from "<a href="http://172.29.30.2:720">172.29.30.2:720</a>"<br>2021-11-24 23:44:56: pid 13: LOG:  new node joined the cluster hostname:"172.29.30.2" port:9009 pgpool_port:5432<br>2021-11-24 23:44:56: pid 13: DETAIL:  Pgpool-II version:"4.2.4" watchdog messaging version: 1.2<br>2021-11-24 23:44:56: pid 13: LOG:  The newly joined node:"<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" had left the cluster because it was lost<br>2021-11-24 23:44:56: pid 13: DETAIL:  lost reason was "NOT REACHABLE" and startup time diff = 1<br>2021-11-24 23:44:56: pid 13: LOG:  new watchdog node connection is received from "<a href="http://172.29.30.3:3306">172.29.30.3:3306</a>"<br>2021-11-24 23:44:56: pid 13: LOG:  new node joined the cluster hostname:"172.29.30.3" port:9009 pgpool_port:5432<br>2021-11-24 23:44:56: pid 13: DETAIL:  Pgpool-II version:"4.2.4" watchdog messaging version: 1.2<br>2021-11-24 23:44:56: pid 13: LOG:  The newly joined node:"<a href="http://172.29.30.3:5432">172.29.30.3:5432</a> Linux 589ce3e63006" had left the cluster because it was lost<br>2021-11-24 23:44:56: pid 13: DETAIL:  lost reason was "NOT REACHABLE" and startup time diff = 0<br>2021-11-24 23:45:00: pid 13: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]<br>2021-11-24 23:45:01: pid 13: LOG:  watchdog node state changed from [INITIALIZING] to [STANDING FOR LEADER]<br>2021-11-24 23:45:01: pid 13: LOG:  our stand for coordinator request is rejected by node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72"<br>2021-11-24 23:45:01: pid 13: LOG:  watchdog node state changed from [STANDING FOR LEADER] to [PARTICIPATING IN ELECTION]<br>2021-11-24 23:45:06: pid 13: LOG:  watchdog node state changed from [PARTICIPATING IN ELECTION] to [JOINING]<br>2021-11-24 23:45:06: pid 13: LOG:  setting the remote node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" as watchdog cluster leader<br>2021-11-24 23:45:06: pid 13: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]<br>2021-11-24 23:45:07: pid 13: LOG:  watchdog node state changed from [INITIALIZING] to [STANDBY]<br>2021-11-24 23:45:07: pid 13: NOTICE:  our join coordinator is rejected by node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72"<br>2021-11-24 23:45:07: pid 13: HINT:  rejoining the cluster.<br>2021-11-24 23:45:07: pid 13: LOG:  leader node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" thinks we are lost, and "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" is not letting us join<br>2021-11-24 23:45:07: pid 13: HINT:  please verify the watchdog life-check and network is working properly<br>2021-11-24 23:45:07: pid 13: LOG:  watchdog node state changed from [STANDBY] to [NETWORK ISOLATION]<br>2021-11-24 23:45:17: pid 13: LOG:  trying again to join the cluster<br>2021-11-24 23:45:17: pid 13: LOG:  watchdog node state changed from [NETWORK ISOLATION] to [JOINING]<br>2021-11-24 23:45:17: pid 13: LOG:  removing the remote node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" from watchdog cluster leader<br>2021-11-24 23:45:17: pid 13: LOG:  setting the remote node "<a href="http://172.29.30.2:5432">172.29.30.2:5432</a> Linux 610cdb714a72" as watchdog cluster leader<br>2021-11-24 23:45:17: pid 13: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]<span style="color:rgb(0,0,0);white-space:pre-wrap"><br></span></div><div><br></div><div>The logs for node 2:</div><div>2021-11-24 23:44:44: pid 12: LOG:  new watchdog node connection is received from "<a href="http://172.29.30.1:36034">172.29.30.1:36034</a>"<br>2021-11-24 23:44:44: pid 12: LOG:  new node joined the cluster hostname:"172.29.30.1" port:9009 pgpool_port:5432<br>2021-11-24 23:44:44: pid 12: DETAIL:  Pgpool-II version:"4.2.6" watchdog messaging version: 1.2<br>2021-11-24 23:44:44: pid 12: LOG:  The newly joined node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" had left the cluster because it was shutdown<br>2021-11-24 23:44:44: pid 12: LOG:  new outbound connection to <a href="http://172.29.30.1:9009">172.29.30.1:9009</a> <br>2021-11-24 23:44:45: pid 13: LOG:  informing the node status change to watchdog<br>2021-11-24 23:44:45: pid 13: DETAIL:  node id :0 status = "NODE DEAD" message:"No heartbeat signal from node"<br>2021-11-24 23:44:45: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:44:45: pid 12: LOG:  received node status change ipc message<br>2021-11-24 23:44:45: pid 12: DETAIL:  No heartbeat signal from node<br>2021-11-24 23:44:45: pid 12: LOG:  remote node "<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" is lost<br>2021-11-24 23:44:46: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:44:47: pid 12: LOG:  watchdog received the failover command from remote pgpool-II node "<a href="http://172.29.30.3:5432">172.29.30.3:5432</a> Linux 589ce3e63006"<br>2021-11-24 23:44:47: pid 12: LOG:  watchdog is processing the failover command [FAILBACK_REQUEST] received from <a href="http://172.29.30.3:5432">172.29.30.3:5432</a> Linux 589ce3e63006<br>2021-11-24 23:44:47: pid 12: LOG:  The failover request does not need quorum to hold<br>2021-11-24 23:44:47: pid 12: DETAIL:  proceeding with the failover<br>2021-11-24 23:44:47: pid 12: HINT:  REQ_DETAIL_CONFIRMED<br>2021-11-24 23:44:47: pid 12: LOG:  received failback request for node_id: 0 from pid [12]<br>2021-11-24 23:44:47: pid 12: LOG:  signal_user1_to_parent_with_reason(0)<br>2021-11-24 23:44:47: pid 1: LOG:  Pgpool-II parent process received SIGUSR1<br>2021-11-24 23:44:47: pid 1: LOG:  Pgpool-II parent process has received failover request<br>2021-11-24 23:44:47: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:44:47: pid 12: LOG:  received the failover indication from Pgpool-II on IPC interface<br>2021-11-24 23:44:47: pid 12: LOG:  watchdog is informed of failover start by the main process<br>2021-11-24 23:44:47: pid 1: LOG:  starting fail back. reconnect host 172.29.30.1(5432)<br>2021-11-24 23:44:47: pid 1: LOG:  Node 1 is not down (status: 2)<br>2021-11-24 23:44:47: pid 1: LOG:  Do not restart children because we are failing back node id 0 host: 172.29.30.1 port: 5432 and we are in streaming replication mode and not all backends were down<br>2021-11-24 23:44:47: pid 1: LOG:  find_primary_node_repeatedly: waiting for finding a primary node<br>2021-11-24 23:44:47: pid 1: LOG:  find_primary_node: standby node is 0<br>2021-11-24 23:44:47: pid 1: LOG:  find_primary_node: primary node is 1<br>2021-11-24 23:44:47: pid 1: LOG:  find_primary_node: standby node is 2<br>2021-11-24 23:44:47: pid 1: LOG:  failover: set new primary node: 1<br>2021-11-24 23:44:47: pid 1: LOG:  failover: set new main node: 0<br>2021-11-24 23:44:47: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:44:47: pid 12: LOG:  received the failover indication from Pgpool-II on IPC interface<br>2021-11-24 23:44:47: pid 12: LOG:  watchdog is informed of failover end by the main process<br>2021-11-24 23:44:47: pid 1: LOG:  failback done. reconnect host 172.29.30.1(5432)<br>2021-11-24 23:44:47: pid 190: LOG:  worker process received restart request<br>2021-11-24 23:44:48: pid 189: LOG:  restart request received in pcp child process<br>2021-11-24 23:44:48: pid 1: LOG:  PCP child 189 exits with status 0 in failover()<br>2021-11-24 23:44:48: pid 1: LOG:  fork a new PCP child pid 191 in failover()<br>2021-11-24 23:44:48: pid 1: LOG:  worker child process with pid: 190 exits with status 256<br>2021-11-24 23:44:48: pid 191: LOG:  PCP process: 191 started<br>2021-11-24 23:44:48: pid 1: LOG:  fork a new worker child process with pid: 192<br>2021-11-24 23:44:48: pid 192: LOG:  process started<br>2021-11-24 23:44:48: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:44:53: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:44:56: pid 12: LOG:  new watchdog node connection is received from "<a href="http://172.29.30.1:39618">172.29.30.1:39618</a>"<br>2021-11-24 23:44:56: pid 12: LOG:  new outbound connection to <a href="http://172.29.30.1:9009">172.29.30.1:9009</a> <br>2021-11-24 23:44:56: pid 12: LOG:  new node joined the cluster hostname:"172.29.30.1" port:9009 pgpool_port:5432<br>2021-11-24 23:44:56: pid 12: DETAIL:  Pgpool-II version:"4.2.6" watchdog messaging version: 1.2<br>2021-11-24 23:44:56: pid 12: LOG:  The newly joined node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" had left the cluster because it was lost<br>2021-11-24 23:44:56: pid 12: DETAIL:  lost reason was "REPORTED BY LIFECHECK" and startup time diff = 1<br>2021-11-24 23:44:56: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:44:56: pid 12: DETAIL:  only lifecheck process can mark this node alive again<br>2021-11-24 23:44:56: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:44:56: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:44:56: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:44:56: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:44:56: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:44:56: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:44:56: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:44:56: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:44:58: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:45:00: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:00: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:00: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:00: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:00: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:00: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:00: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:00: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:01: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:01: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:01: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:01: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:01: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:01: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:01: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:01: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:01: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:01: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:01: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:01: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:01: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:01: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:01: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:01: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:03: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:45:06: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:06: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:06: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:06: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:06: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:06: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:06: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:06: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:06: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:06: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:06: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:06: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:06: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:06: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:06: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:06: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:07: pid 12: LOG:  lost remote node "<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" is requesting to join the cluster<br>2021-11-24 23:45:07: pid 12: DETAIL:  rejecting the request until life-check inform us that it is reachable again<br>2021-11-24 23:45:07: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:07: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:07: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:07: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:07: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:07: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:07: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:07: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:07: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:07: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:07: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:07: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:07: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:07: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:07: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:07: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:09: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:45:14: pid 12: LOG:  new IPC connection received<br>2021-11-24 23:45:17: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:17: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:17: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:17: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:17: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:17: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:17: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:17: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:17: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:17: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:17: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:17: pid 12: DETAIL:  only life-check process can mark this node alive again<br>2021-11-24 23:45:17: pid 12: LOG:  we have received the NODE INFO message from the node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" that was lost<br>2021-11-24 23:45:17: pid 12: DETAIL:  we had lost this node because of "REPORTED BY LIFECHECK"<br>2021-11-24 23:45:17: pid 12: LOG:  node:"<a href="http://172.29.30.1:5432">172.29.30.1:5432</a> Linux 8e410fda51ac" was reported lost by the lifecheck process<br>2021-11-24 23:45:17: pid 12: DETAIL:  only life-check process can mark this node alive again<br></div></div>