[pgpool-general: 6729] Re: Watchdog problem - crashed standby not being detected?

Martin Goodson kaemaril at googlemail.com
Wed Oct 9 06:13:13 JST 2019


On 08/10/2019 01:17, Tatsuo Ishii wrote:
> My wild guess is, watchdog communication socket (it uses TCP/IP) was
> blocked by the standby node crash, and this makes watchdog state
> machine freezing. Thus watchdog did not notice heartbeat channel down.
> 
>> Hi Usama,
>>
>> Can you please look into this?
>>
>> This sounds weired to me too because:
>>
>> 1) tcp_keepalive does not affect to heartbeat since it uses UDP, not TCP.
>>
>> 2) Why heartbeat does not work in the case?
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp

Hello. We had another HA/DR test today, but unfortunately today we 
didn't get as far as force-crashing one of the pgpools, other tests were 
done dedicated to the backend nodes instead.

However, I was able to do a tcp dump on the UDP port, and I could see 
that the traffic was definitely going through at two second intervals. 
Initial thoughts from our sysadmin before settling on the keepalive 
theory was that, somehow, the heartbeat traffic was being blocked by a 
firewall which pgpool was somehow silently discarding.  So that idea at 
leaast has been ruled out :)

  I will see if I can force crash a server in our dev environment 
tomorrow while dumping the UDP traffic, and see what happens to the 
traffic with regards to keepalives, etc.

I'll ramp up the logging level as well, and see what happens.

Regards,

M.
-- 
Martin Goodson

"Have you thought up some clever plan, Doctor?"
"Yes, Jamie, I believe I have."
"What're you going to do?"
"Bung a rock at it."


More information about the pgpool-general mailing list