[pgpool-hackers: 4264] Re: Watchdog heartbeat issue

Muhammad Usama muhammad.usama at percona.com
Wed Jan 18 18:45:29 JST 2023


On Tue, Jan 17, 2023 at 6:49 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> Thank you for investigating the issue.
>
> > Hi Ishii San
> >
> > Thanks for figuring out the issue.
> > I think removing the code in question altogether could mark the remote
> node
> > as dead too early at startup and can delay the watchdog cluster
> > stabilization
> > when there is a few seconds delay between the node startup.
> > So IMHO the way to solve this is to wait for twice the wd_interval or
> > wd_heartbeat_deadtime (depending on the configuration) if
> > is_wd_lifecheck_ready()
> > reports a failure.
> >
> > What do you think of the attached patch?
>
> Probably I am missing something but I wonder why the watchdog leader
> node's lifecheck does not notice that node 1 watchdog will never send
> hearbeat signal. In the pgpool0 log:
>
> 2023-01-14 00:27:15: watchdog pid 26708: LOG:  read from socket failed,
> remote end closed the connection
> 2023-01-14 00:27:15: watchdog pid 26708: LOG:  client socket of
> localhost:50004 Linux abf1b59af489 is closed
> 2023-01-14 00:27:15: watchdog pid 26708: LOG:  remote node
> "localhost:50004 Linux abf1b59af489" is shutting down
> 2023-01-14 00:27:15: watchdog pid 26708: LOG:  removing watchdog node
> "localhost:50004 Linux abf1b59af489" from the standby list
>
> It seems the leader watchdog alreay noticed that node 1 was down.
>

When the watchdog fails to communicate with a remote node despite retries,
it marks the node status to lost/down. As for the lifecheck
process, it only informs the node-down status to the watchdog process when
the heartbeat breaks after at least one successful heartbeat cycle
is completed.

Best regards
Muhammad Usama


> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20230118/be1697bd/attachment.htm>


More information about the pgpool-hackers mailing list