[pgpool-general: 2287] Re: Strange watchdog - trusted server issue

Yugo Nagata nagata at sraoss.co.jp
Fri Nov 15 16:11:43 JST 2013


Hi,

On Fri, 15 Nov 2013 15:44:24 +0900
Yugo Nagata <nagata at sraoss.co.jp> wrote:

> Hi,
> 
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10108]: get_result: ping data: PING 
> > 10.201.101.92 (10.201.101.92) 56(84) bytes of data.#012#012--- 
> > 10.201.101.92 ping statistics ---#0123 packets transmitted, 3 received, 
> > 0% packet loss, time 2006ms#012rtt min/avg/max/mdev = 
> > 0.000/0.000/0.002/0.001 ms
>

I understand the problem now. It's very simple.
Watchdog regards ping is succeeded when the average RTT > 0.
So, when the ave RTT = 0, this is regarded as failure. :-(

Could you try patch attached?
 
> There is a strange string '#012', which should be a line break.
> Is this the same is other log lines about PING, or only right
> before down of pgpool-II?
> 
> 
> On Thu, 14 Nov 2013 14:45:10 +0100
> Sam Wouters <sam at ericom.be> wrote:
> 
> > Hi,
> > 
> > I have a running pgpool-II (3.3.1 on ubuntu-lts, package from postgresql 
> > repo) cluster, consisting of three nodes with watchdog enabled.
> > After a random period of time (a couple hours), watchdog goes into down 
> > state, with below log lines. This happenes consequently and on different 
> > clusters, no network issues that I know off (checked tcpdumps etc).
> > The log also says that ping to the trusted servers succeeds, but 
> > nevertheless you get the "failed to connect to any trusted servers"?
> > 
> > Any help in debugging this issue would be very much appreciated....
> > 
> > Sam
> > 
> > <LOG SNIPPET>
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10107]: wd_hb_send: send 224 byte packet
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10107]: wd_hb_sender: send heartbeat 
> > signal to 10.201.101.92:9694
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10106]: wd_hb_recv: received 224 byte 
> > packet
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10106]: wd_hb_receiver: received 
> > heartbeat signal from 10.201.101.92:5432
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10108]: exec_ping: succeed to ping 
> > 10.201.101.91
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10108]: get_result: ping data: PING 
> > 10.201.101.91 (10.201.101.91) 56(84) bytes of data.#012
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10108]: exec_ping: succeed to ping 
> > 10.201.101.92
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10108]: get_result: ping data: PING 
> > 10.201.101.92 (10.201.101.92) 56(84) bytes of data.#012#012--- 
> > 10.201.101.92 ping statistics ---#0123 packets transmitted, 3 received, 
> > 0% packet loss, time 2006ms#012rtt min/avg/max/mdev = 
> > 0.000/0.000/0.002/0.001 ms
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10108]: wd_lifecheck: failed to 
> > connect to any trusted servers
> > Nov 14 13:34:11 pgdb92-id03 pgpool[10108]: wd_IP_down: not delegate IP 
> > holder
> > </LOG SNIPPET>
> > 
> > 
> 
> 
> -- 
> Yugo Nagata <nagata at sraoss.co.jp>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general


-- 
Yugo Nagata <nagata at sraoss.co.jp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wd_ping.c.diff
Type: text/x-diff
Size: 413 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20131115/11fca3b2/attachment.bin>


More information about the pgpool-general mailing list