[pgpool-general: 6203] Re: Behavior when the heartbeat is not received

Bo Peng pengbo at sraoss.co.jp
Mon Aug 20 10:13:47 JST 2018


Hi,

> Hi,
> 
> I'm using the heartbeat mode as the lifecheck method.  What happens if
> the heartbeat signal is not received?
> 
> In Pgpool-II 3.6.12, when I close the heartbeat port on the master,
> the heartbeat signal is not received and the standby is disconnected
> once.
> 
>   Aug 16 10:01:30 centos7-1 pgpool[12906]: [10-1] LOG:  watchdog: lifecheck started
> 
>   # firewall-cmd --remove-port=9694/udp
> 
>   Aug 16 10:02:20 centos7-1 pgpool[12906]: [11-1] LOG:  informing the node status change to watchdog
>   Aug 16 10:02:20 centos7-1 pgpool[12906]: [11-2] DETAIL:  node id :1 status = "NODE DEAD" message:"No heartbeat signal from node"
>   Aug 16 10:02:20 centos7-1 pgpool[12904]: [24-1] LOG:  new IPC connection received
>   Aug 16 10:02:20 centos7-1 pgpool[12904]: [25-1] LOG:  received node status change ipc message
>   Aug 16 10:02:20 centos7-1 pgpool[12904]: [25-2] DETAIL:  No heartbeat signal from node
>   Aug 16 10:02:20 centos7-1 pgpool[12904]: [26-1] LOG:  remote node "192.168.137.72:9999 Linux centos7-2" is lost
>   Aug 16 10:02:20 centos7-1 pgpool[12904]: [27-1] LOG:  removing watchdog node "192.168.137.72:9999 Linux centos7-2" from the standby list
> 
> However then, the standby watchdog will reconnect.
> 
>   Aug 16 10:02:20 centos7-1 pgpool[12904]: [28-1] LOG:  new outbound connection to 192.168.137.72:9000
>   Aug 16 10:02:20 centos7-1 pgpool[12904]: [29-1] LOG:  new watchdog node connection is received from "192.168.137.72:12471"
>   Aug 16 10:02:20 centos7-1 pgpool[12904]: [30-1] LOG:  new node joined the cluster hostname:"192.168.137.72" port:9000 pgpool_port:9999
> 
> Is this behavior correct?

Yes, it is the correct behaviour of watchdog.

The basic recovery mechanism of watchdog is that 
it keep trying to connect to the lost nodes to make sure that 
if the nodes were lost because of network partitioning or some
other network issue then as soon as the communication link establishes 
again, it reconnects to the lost nodes.

This way watchdog makes sure that if split-brain happens because of network
partitioning it should be immediately recovered from.

> The pcp_watchdog_info command result is as follows:
> 
>   $ pcp_watchdog_info
>   2 YES 192.168.137.71:9999 Linux centos7-1 192.168.137.71
>   
>   192.168.137.71:9999 Linux centos7-1 192.168.137.71 9999 9000 4 MASTER
>   192.168.137.72:9999 Linux centos7-2 192.168.137.72 9999 9000 7 STANDBY
> 
> I will attach the pgpool.conf files for both nodes.
> 
> Best regards,
> 
> 
> ----
> Tomoaki Sato <sato at sraoss.co.jp>
> SRA OSS, Inc. Japan


-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan



More information about the pgpool-general mailing list