[pgpool-general: 4492] Re: Watch dog node goes down is cable is disconnected

Thu Feb 25 21:22:41 JST 2016

On Thu, Feb 25, 2016 at 12:59 AM, Lucas Luengas <lucasluengas at gmail.com>
wrote:
> Hello.
>
> I am using pgpool-II version 3.4.4 (tataraboshi), with master/slave with
> streaming replication, and watchdog. I am using Centos 6.7.
> I have 2 nodes: node A and node B.
> Failover process is ok.
> Virtual ip address is assigned ok between nodes if nodes are restarted.
But
> I have a problem with watchdog process if one node is disconnected of
> network.
> If node A is disconected from network (for example cable is disconnected),
> then virtual ip address is assigned to node B. That is ok.
> After a few minutes, node A is connected to network again. My problem is
> than pgpool watchdog process of node A does not connect with node B, but
> network is ok (ping ok, netstat listening ports (9999, 9898, ...) are ok,
> ...).
> And node B does not connect with node A.
>
> Pgpool log of node A shows every 10 seconds: (ip 192.168.0.226 is node B)
>
> pid 16913: LOG:  checking pgpool status by heartbeat
> pid 16913: DETAIL:  pgpool: 1 at "192.168.0.226:9999" status is down
>
> pgpool log of node B shows every 10 seconds: (ip 192.168.0.224 is node A)
>
> pid 8722: LOG:  checking pgpool status by heartbeat
> pid 8722: DETAIL:  pgpool: 1 at "192.168.0.224:9999" status is down
>
> I can use pcp_watchdog_info command in both servers. In node A, status of
> node B is 4 (down). In node B, status of node A is 4 (down)
>
> If I restart pgpool service in node A, then pgpool watchdog process is ok
> again and status are ok for both nodes, and pgpool watchdog is recovered
in
> both nodes.
>
> What is the problem?

This is the expected behaviour of pgpool-II watchdog, When
the watchdog communication with the other pgpool-II node is lost. The
watchdog marks the other node's status as DOWN, And only restart of the
other node can make the node rejoin the watchdog cluster.
You can manapulate the pgpool-II configuration parameters
*wd_heartbeat_deadtime* and *wd_interval* to swollow the temporary network
glitches but once the watchdog node status is marked as down only the
restart of the other pgpool-II can make the node connected again.

Regards
Muhammad Usama

>
> Thank you for your help.
>
>
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20160225/63847395/attachment.html>