[pgpool-general: 7896] Possible race condition during startup causing node to enter network isolation

Emond Papegaaij emond.papegaaij at gmail.com
Thu Nov 25 17:54:50 JST 2021


Hi all,

In our tests we are seeing sporadic failures when services on one node are
restarted. These tests run in a 3-node setup all nodes running a database
and pgpool, which node 1 being the one restarting its services, node 2
being both pgpool leader and primary database and node 3 running a standby
database and pgpool. Looking at the logs (see below), it seems node 1 is
not allowed to connect to node 2 because node 2 has marked node 1 as dead
via the lifecheck. However, node 1 will not become alive until it has
joined the cluster. The end result is that node 1 keeps trying to join the
cluster and node 2 keeps rejecting it.

In the logs below, what I call node 1 is 172.29.30.1, node 2 is 172.29.30.2
and node 3 is 172.29.30.3. I think the problem lies in the 7th line of the
node 2 logs: "node id :0 status = "NODE DEAD" message:"No heartbeat signal
from node"". Node 2 never lets node 1 recover from this status. Note that
the node 2 logs also show the automatic failback of the database on node 1.

Best regards,
Emond

The logs for node 1:
2021-11-24 23:44:44: pid 13: LOG:  watchdog cluster is configured with 2
remote nodes
2021-11-24 23:44:44: pid 13: LOG:  watchdog remote node:0 on
172.29.30.2:9009
2021-11-24 23:44:44: pid 13: LOG:  watchdog remote node:1 on
172.29.30.3:9009
2021-11-24 23:44:44: pid 13: LOG:  interface monitoring is disabled in
watchdog
2021-11-24 23:44:44: pid 13: LOG:  watchdog node state changed from [DEAD]
to [LOADING]
2021-11-24 23:44:44: pid 13: LOG:  new outbound connection to
172.29.30.2:9009
2021-11-24 23:44:44: pid 13: LOG:  setting the remote node "172.29.30.2:5432
Linux 610cdb714a72" as watchdog cluster leader
2021-11-24 23:44:44: pid 13: LOG:  watchdog node state changed from
[LOADING] to [INITIALIZING]
2021-11-24 23:44:44: pid 13: LOG:  new watchdog node connection is received
from "172.29.30.2:53967
2021-11-24 23:44:44: pid 13: LOG:  new node joined the cluster
hostname:"172.29.30.2" port:9009 pgpool_port:5432
2021-11-24 23:44:44: pid 13: DETAIL:  Pgpool-II version:"4.2.4" watchdog
messaging version: 1.2
2021-11-24 23:44:44: pid 13: LOG:  new outbound connection to
172.29.30.3:9009
2021-11-24 23:44:44: pid 13: LOG:  new watchdog node connection is received
from "172.29.30.3:57577"
2021-11-24 23:44:44: pid 13: LOG:  new node joined the cluster
hostname:"172.29.30.3" port:9009 pgpool_port:5432
2021-11-24 23:44:44: pid 13: DETAIL:  Pgpool-II version:"4.2.4" watchdog
messaging version: 1.2
2021-11-24 23:44:45: pid 13: LOG:  read from socket failed, remote end
closed the connection
2021-11-24 23:44:45: pid 13: LOG:  client socket of 172.29.30.2:5432 Linux
610cdb714a72 is closed
2021-11-24 23:44:45: pid 13: LOG:  remote node "172.29.30.2:5432 Linux
610cdb714a72" is reporting that it has lost us
2021-11-24 23:44:45: pid 13: LOG:  read from socket failed, remote end
closed the connection
2021-11-24 23:44:45: pid 13: LOG:  outbound socket of 172.29.30.2:5432
Linux 610cdb714a72 is closed
2021-11-24 23:44:45: pid 13: LOG:  remote node "172.29.30.2:5432 Linux
610cdb714a72" is not reachable
2021-11-24 23:44:45: pid 13: DETAIL:  marking the node as lost
2021-11-24 23:44:45: pid 13: LOG:  remote node "172.29.30.2:5432 Linux
610cdb714a72" is lost
2021-11-24 23:44:45: pid 13: LOG:  watchdog cluster has lost the
coordinator node
2021-11-24 23:44:45: pid 13: LOG:  removing the remote node "
172.29.30.2:5432 Linux 610cdb714a72" from watchdog cluster leader
2021-11-24 23:44:46: pid 13: LOG:  watchdog node state changed from
[INITIALIZING] to [STANDING FOR LEADER]
2021-11-24 23:44:46: pid 13: LOG:  our stand for coordinator request is
rejected by node "172.29.30.3:5432 Linux 589ce3e63006"
2021-11-24 23:44:46: pid 13: DETAIL:  we might be in partial network
isolation and cluster already have a valid leader
2021-11-24 23:44:46: pid 13: HINT:  please verify the watchdog life-check
and network is working properly
2021-11-24 23:44:46: pid 13: LOG:  watchdog node state changed from
[STANDING FOR LEADER] to [NETWORK ISOLATION]
2021-11-24 23:44:46: pid 13: LOG:  read from socket failed, remote end
closed the connection
2021-11-24 23:44:46: pid 13: LOG:  client socket of 172.29.30.3:5432 Linux
589ce3e63006 is closed
2021-11-24 23:44:46: pid 13: LOG:  remote node "172.29.30.3:5432 Linux
589ce3e63006" is reporting that it has lost us
2021-11-24 23:44:46: pid 13: LOG:  read from socket failed, remote end
closed the connection
2021-11-24 23:44:46: pid 13: LOG:  outbound socket of 172.29.30.3:5432
Linux 589ce3e63006 is closed
2021-11-24 23:44:46: pid 13: LOG:  remote node "172.29.30.3:5432 Linux
589ce3e63006" is not reachable
2021-11-24 23:44:46: pid 13: DETAIL:  marking the node as lost
2021-11-24 23:44:46: pid 13: LOG:  remote node "172.29.30.3:5432 Linux
589ce3e63006" is lost
2021-11-24 23:44:56: pid 13: LOG:  trying again to join the cluster
2021-11-24 23:44:56: pid 13: LOG:  watchdog node state changed from
[NETWORK ISOLATION] to [JOINING]
2021-11-24 23:44:56: pid 13: LOG:  new outbound connection to
172.29.30.2:9009
2021-11-24 23:44:56: pid 13: LOG:  new outbound connection to
172.29.30.3:9009
2021-11-24 23:44:56: pid 13: LOG:  new watchdog node connection is received
from "172.29.30.2:720"
2021-11-24 23:44:56: pid 13: LOG:  new node joined the cluster
hostname:"172.29.30.2" port:9009 pgpool_port:5432
2021-11-24 23:44:56: pid 13: DETAIL:  Pgpool-II version:"4.2.4" watchdog
messaging version: 1.2
2021-11-24 23:44:56: pid 13: LOG:  The newly joined node:"172.29.30.2:5432
Linux 610cdb714a72" had left the cluster because it was lost
2021-11-24 23:44:56: pid 13: DETAIL:  lost reason was "NOT REACHABLE" and
startup time diff = 1
2021-11-24 23:44:56: pid 13: LOG:  new watchdog node connection is received
from "172.29.30.3:3306"
2021-11-24 23:44:56: pid 13: LOG:  new node joined the cluster
hostname:"172.29.30.3" port:9009 pgpool_port:5432
2021-11-24 23:44:56: pid 13: DETAIL:  Pgpool-II version:"4.2.4" watchdog
messaging version: 1.2
2021-11-24 23:44:56: pid 13: LOG:  The newly joined node:"172.29.30.3:5432
Linux 589ce3e63006" had left the cluster because it was lost
2021-11-24 23:44:56: pid 13: DETAIL:  lost reason was "NOT REACHABLE" and
startup time diff = 0
2021-11-24 23:45:00: pid 13: LOG:  watchdog node state changed from
[JOINING] to [INITIALIZING]
2021-11-24 23:45:01: pid 13: LOG:  watchdog node state changed from
[INITIALIZING] to [STANDING FOR LEADER]
2021-11-24 23:45:01: pid 13: LOG:  our stand for coordinator request is
rejected by node "172.29.30.2:5432 Linux 610cdb714a72"
2021-11-24 23:45:01: pid 13: LOG:  watchdog node state changed from
[STANDING FOR LEADER] to [PARTICIPATING IN ELECTION]
2021-11-24 23:45:06: pid 13: LOG:  watchdog node state changed from
[PARTICIPATING IN ELECTION] to [JOINING]
2021-11-24 23:45:06: pid 13: LOG:  setting the remote node "172.29.30.2:5432
Linux 610cdb714a72" as watchdog cluster leader
2021-11-24 23:45:06: pid 13: LOG:  watchdog node state changed from
[JOINING] to [INITIALIZING]
2021-11-24 23:45:07: pid 13: LOG:  watchdog node state changed from
[INITIALIZING] to [STANDBY]
2021-11-24 23:45:07: pid 13: NOTICE:  our join coordinator is rejected by
node "172.29.30.2:5432 Linux 610cdb714a72"
2021-11-24 23:45:07: pid 13: HINT:  rejoining the cluster.
2021-11-24 23:45:07: pid 13: LOG:  leader node "172.29.30.2:5432 Linux
610cdb714a72" thinks we are lost, and "172.29.30.2:5432 Linux 610cdb714a72"
is not letting us join
2021-11-24 23:45:07: pid 13: HINT:  please verify the watchdog life-check
and network is working properly
2021-11-24 23:45:07: pid 13: LOG:  watchdog node state changed from
[STANDBY] to [NETWORK ISOLATION]
2021-11-24 23:45:17: pid 13: LOG:  trying again to join the cluster
2021-11-24 23:45:17: pid 13: LOG:  watchdog node state changed from
[NETWORK ISOLATION] to [JOINING]
2021-11-24 23:45:17: pid 13: LOG:  removing the remote node "
172.29.30.2:5432 Linux 610cdb714a72" from watchdog cluster leader
2021-11-24 23:45:17: pid 13: LOG:  setting the remote node "172.29.30.2:5432
Linux 610cdb714a72" as watchdog cluster leader
2021-11-24 23:45:17: pid 13: LOG:  watchdog node state changed from
[JOINING] to [INITIALIZING]

The logs for node 2:
2021-11-24 23:44:44: pid 12: LOG:  new watchdog node connection is received
from "172.29.30.1:36034"
2021-11-24 23:44:44: pid 12: LOG:  new node joined the cluster
hostname:"172.29.30.1" port:9009 pgpool_port:5432
2021-11-24 23:44:44: pid 12: DETAIL:  Pgpool-II version:"4.2.6" watchdog
messaging version: 1.2
2021-11-24 23:44:44: pid 12: LOG:  The newly joined node:"172.29.30.1:5432
Linux 8e410fda51ac" had left the cluster because it was shutdown
2021-11-24 23:44:44: pid 12: LOG:  new outbound connection to
172.29.30.1:9009
2021-11-24 23:44:45: pid 13: LOG:  informing the node status change to
watchdog
2021-11-24 23:44:45: pid 13: DETAIL:  node id :0 status = "NODE DEAD"
message:"No heartbeat signal from node"
2021-11-24 23:44:45: pid 12: LOG:  new IPC connection received
2021-11-24 23:44:45: pid 12: LOG:  received node status change ipc message
2021-11-24 23:44:45: pid 12: DETAIL:  No heartbeat signal from node
2021-11-24 23:44:45: pid 12: LOG:  remote node "172.29.30.1:5432 Linux
8e410fda51ac" is lost
2021-11-24 23:44:46: pid 12: LOG:  new IPC connection received
2021-11-24 23:44:47: pid 12: LOG:  watchdog received the failover command
from remote pgpool-II node "172.29.30.3:5432 Linux 589ce3e63006"
2021-11-24 23:44:47: pid 12: LOG:  watchdog is processing the failover
command [FAILBACK_REQUEST] received from 172.29.30.3:5432 Linux 589ce3e63006
2021-11-24 23:44:47: pid 12: LOG:  The failover request does not need
quorum to hold
2021-11-24 23:44:47: pid 12: DETAIL:  proceeding with the failover
2021-11-24 23:44:47: pid 12: HINT:  REQ_DETAIL_CONFIRMED
2021-11-24 23:44:47: pid 12: LOG:  received failback request for node_id: 0
from pid [12]
2021-11-24 23:44:47: pid 12: LOG:  signal_user1_to_parent_with_reason(0)
2021-11-24 23:44:47: pid 1: LOG:  Pgpool-II parent process received SIGUSR1
2021-11-24 23:44:47: pid 1: LOG:  Pgpool-II parent process has received
failover request
2021-11-24 23:44:47: pid 12: LOG:  new IPC connection received
2021-11-24 23:44:47: pid 12: LOG:  received the failover indication from
Pgpool-II on IPC interface
2021-11-24 23:44:47: pid 12: LOG:  watchdog is informed of failover start
by the main process
2021-11-24 23:44:47: pid 1: LOG:  starting fail back. reconnect host
172.29.30.1(5432)
2021-11-24 23:44:47: pid 1: LOG:  Node 1 is not down (status: 2)
2021-11-24 23:44:47: pid 1: LOG:  Do not restart children because we are
failing back node id 0 host: 172.29.30.1 port: 5432 and we are in streaming
replication mode and not all backends were down
2021-11-24 23:44:47: pid 1: LOG:  find_primary_node_repeatedly: waiting for
finding a primary node
2021-11-24 23:44:47: pid 1: LOG:  find_primary_node: standby node is 0
2021-11-24 23:44:47: pid 1: LOG:  find_primary_node: primary node is 1
2021-11-24 23:44:47: pid 1: LOG:  find_primary_node: standby node is 2
2021-11-24 23:44:47: pid 1: LOG:  failover: set new primary node: 1
2021-11-24 23:44:47: pid 1: LOG:  failover: set new main node: 0
2021-11-24 23:44:47: pid 12: LOG:  new IPC connection received
2021-11-24 23:44:47: pid 12: LOG:  received the failover indication from
Pgpool-II on IPC interface
2021-11-24 23:44:47: pid 12: LOG:  watchdog is informed of failover end by
the main process
2021-11-24 23:44:47: pid 1: LOG:  failback done. reconnect host
172.29.30.1(5432)
2021-11-24 23:44:47: pid 190: LOG:  worker process received restart request
2021-11-24 23:44:48: pid 189: LOG:  restart request received in pcp child
process
2021-11-24 23:44:48: pid 1: LOG:  PCP child 189 exits with status 0 in
failover()
2021-11-24 23:44:48: pid 1: LOG:  fork a new PCP child pid 191 in failover()
2021-11-24 23:44:48: pid 1: LOG:  worker child process with pid: 190 exits
with status 256
2021-11-24 23:44:48: pid 191: LOG:  PCP process: 191 started
2021-11-24 23:44:48: pid 1: LOG:  fork a new worker child process with pid:
192
2021-11-24 23:44:48: pid 192: LOG:  process started
2021-11-24 23:44:48: pid 12: LOG:  new IPC connection received
2021-11-24 23:44:53: pid 12: LOG:  new IPC connection received
2021-11-24 23:44:56: pid 12: LOG:  new watchdog node connection is received
from "172.29.30.1:39618"
2021-11-24 23:44:56: pid 12: LOG:  new outbound connection to
172.29.30.1:9009
2021-11-24 23:44:56: pid 12: LOG:  new node joined the cluster
hostname:"172.29.30.1" port:9009 pgpool_port:5432
2021-11-24 23:44:56: pid 12: DETAIL:  Pgpool-II version:"4.2.6" watchdog
messaging version: 1.2
2021-11-24 23:44:56: pid 12: LOG:  The newly joined node:"172.29.30.1:5432
Linux 8e410fda51ac" had left the cluster because it was lost
2021-11-24 23:44:56: pid 12: DETAIL:  lost reason was "REPORTED BY
LIFECHECK" and startup time diff = 1
2021-11-24 23:44:56: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:44:56: pid 12: DETAIL:  only lifecheck process can mark this
node alive again
2021-11-24 23:44:56: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:44:56: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:44:56: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:44:56: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:44:56: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:44:56: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:44:56: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:44:56: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:44:58: pid 12: LOG:  new IPC connection received
2021-11-24 23:45:00: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:00: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:00: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:00: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:00: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:00: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:00: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:00: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:01: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:01: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:01: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:01: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:01: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:01: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:01: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:01: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:01: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:01: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:01: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:01: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:01: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:01: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:01: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:01: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:03: pid 12: LOG:  new IPC connection received
2021-11-24 23:45:06: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:06: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:06: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:06: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:06: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:06: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:06: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:06: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:06: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:06: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:06: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:06: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:06: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:06: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:06: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:06: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:07: pid 12: LOG:  lost remote node "172.29.30.1:5432 Linux
8e410fda51ac" is requesting to join the cluster
2021-11-24 23:45:07: pid 12: DETAIL:  rejecting the request until
life-check inform us that it is reachable again
2021-11-24 23:45:07: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:07: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:07: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:07: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:07: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:07: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:07: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:07: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:07: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:07: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:07: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:07: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:07: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:07: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:07: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:07: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:09: pid 12: LOG:  new IPC connection received
2021-11-24 23:45:14: pid 12: LOG:  new IPC connection received
2021-11-24 23:45:17: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:17: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:17: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:17: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:17: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:17: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:17: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:17: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:17: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:17: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:17: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:17: pid 12: DETAIL:  only life-check process can mark this
node alive again
2021-11-24 23:45:17: pid 12: LOG:  we have received the NODE INFO message
from the node:"172.29.30.1:5432 Linux 8e410fda51ac" that was lost
2021-11-24 23:45:17: pid 12: DETAIL:  we had lost this node because of
"REPORTED BY LIFECHECK"
2021-11-24 23:45:17: pid 12: LOG:  node:"172.29.30.1:5432 Linux
8e410fda51ac" was reported lost by the lifecheck process
2021-11-24 23:45:17: pid 12: DETAIL:  only life-check process can mark this
node alive again
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20211125/6fef78d2/attachment-0001.htm>


More information about the pgpool-general mailing list