[pgpool-general: 8116] Re: Possible race condition during startup causing node to enter network isolation

Emond Papegaaij emond.papegaaij at gmail.com
Mon May 2 22:26:19 JST 2022


Hi,

That's great to hear. I've applied the patch to 4.3.1 in our CI
environment. I'll monitor the builds to see if the problem is fixed now. As
the problem only popped up once of twice every week, it will take some time
to verify.

Best regards,
Emond

On Mon, May 2, 2022 at 1:56 AM Bo Peng <pengbo at sraoss.co.jp> wrote:

> Hello,
>
> > Any thoughts on this issue? We are still experiencing intermittent test
> > failures due to this issue.
>
> Another developer who introduced Watchdog Macanisam to Pgpool-II fixed
> this issue
> in the commit below:
>
>
> https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commitdiff;h=3337aa8c07cd07cdbc238a5a154a8c4d8dbe0472
>
> It will be released in the next minor release scheduled on May 19th.
>
> > On Fri, Apr 1, 2022 at 9:03 AM Emond Papegaaij <
> emond.papegaaij at gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Unfortunately, this issue still pops up every once in a while. We are
> now
> > > running 4.3.1. In our latest failure, the issue occured in a simple
> restart
> > > of all services on node 1, with node 3 being the leader. Pgpool on
> node 1
> > > tries to rejoin the cluster, but gets rejected over and over again.
> Node 3
> > > reports that 'only life-check process can mark this node alive again'.
> I've
> > > attached the full logs of both node 1 and 3. The configuration hasn't
> > > changed since last time.
> > >
> > > Best regards,
> > > Emond
> > >
> > > On Mon, Nov 29, 2021 at 4:12 PM Emond Papegaaij <
> emond.papegaaij at gmail.com>
> > > wrote:
> > >
> > >> On Mon, Nov 29, 2021 at 3:55 PM Bo Peng <pengbo at sraoss.co.jp> wrote:
> > >>
> > >>> Thank you for your test.
> > >>>
> > >>> Because we did some bug fix for watchdog since 4.2.4, it might be an
> > >>> upgrade issue.
> > >>> If you can reproduce this issue in 4.2.6, could you share the pgpool
> > >>> logs of all nodes?
> > >>>
> > >>
> > >> I'll continue to monitor the tests. If one fails again, I'll share the
> > >> logs. As I said, this could take some time, because the failure only
> occurs
> > >> about once a week. Thanks for your help so far.
> > >>
> > >> Best regards,
> > >> Emond
> > >>
> > >
>
>
> --
> Bo Peng <pengbo at sraoss.co.jp>
> SRA OSS, Inc. Japan
> http://www.sraoss.co.jp/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220502/f18c519a/attachment-0001.htm>


More information about the pgpool-general mailing list