[pgpool-general: 6729] Re: Watchdog problem - crashed standby not being detected?
Martin Goodson
kaemaril at googlemail.com
Wed Oct 9 06:13:13 JST 2019
On 08/10/2019 01:17, Tatsuo Ishii wrote:
> My wild guess is, watchdog communication socket (it uses TCP/IP) was
> blocked by the standby node crash, and this makes watchdog state
> machine freezing. Thus watchdog did not notice heartbeat channel down.
>
>> Hi Usama,
>>
>> Can you please look into this?
>>
>> This sounds weired to me too because:
>>
>> 1) tcp_keepalive does not affect to heartbeat since it uses UDP, not TCP.
>>
>> 2) Why heartbeat does not work in the case?
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
Hello. We had another HA/DR test today, but unfortunately today we
didn't get as far as force-crashing one of the pgpools, other tests were
done dedicated to the backend nodes instead.
However, I was able to do a tcp dump on the UDP port, and I could see
that the traffic was definitely going through at two second intervals.
Initial thoughts from our sysadmin before settling on the keepalive
theory was that, somehow, the heartbeat traffic was being blocked by a
firewall which pgpool was somehow silently discarding. So that idea at
leaast has been ruled out :)
I will see if I can force crash a server in our dev environment
tomorrow while dumping the UDP traffic, and see what happens to the
traffic with regards to keepalives, etc.
I'll ramp up the logging level as well, and see what happens.
Regards,
M.
--
Martin Goodson
"Have you thought up some clever plan, Doctor?"
"Yes, Jamie, I believe I have."
"What're you going to do?"
"Bung a rock at it."
More information about the pgpool-general
mailing list