[pgpool-general: 8126] Re: Possible race condition during startup causing node to enter network isolation

Bo Peng pengbo at sraoss.co.jp
Fri May 6 09:03:20 JST 2022


Hello,

> That's great to hear. I've applied the patch to 4.3.1 in our CI
> environment. I'll monitor the builds to see if the problem is fixed now. As
> the problem only popped up once of twice every week, it will take some time
> to verify.

Thank you.

> > Hello,
> >
> > > Any thoughts on this issue? We are still experiencing intermittent test
> > > failures due to this issue.
> >
> > Another developer who introduced Watchdog Macanisam to Pgpool-II fixed
> > this issue
> > in the commit below:
> >
> >
> > https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commitdiff;h=3337aa8c07cd07cdbc238a5a154a8c4d8dbe0472
> >
> > It will be released in the next minor release scheduled on May 19th.
> >
> > > On Fri, Apr 1, 2022 at 9:03 AM Emond Papegaaij <
> > emond.papegaaij at gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Unfortunately, this issue still pops up every once in a while. We are
> > now
> > > > running 4.3.1. In our latest failure, the issue occured in a simple
> > restart
> > > > of all services on node 1, with node 3 being the leader. Pgpool on
> > node 1
> > > > tries to rejoin the cluster, but gets rejected over and over again.
> > Node 3
> > > > reports that 'only life-check process can mark this node alive again'.
> > I've
> > > > attached the full logs of both node 1 and 3. The configuration hasn't
> > > > changed since last time.
> > > >
> > > > Best regards,
> > > > Emond
> > > >
> > > > On Mon, Nov 29, 2021 at 4:12 PM Emond Papegaaij <
> > emond.papegaaij at gmail.com>
> > > > wrote:
> > > >
> > > >> On Mon, Nov 29, 2021 at 3:55 PM Bo Peng <pengbo at sraoss.co.jp> wrote:
> > > >>
> > > >>> Thank you for your test.
> > > >>>
> > > >>> Because we did some bug fix for watchdog since 4.2.4, it might be an
> > > >>> upgrade issue.
> > > >>> If you can reproduce this issue in 4.2.6, could you share the pgpool
> > > >>> logs of all nodes?
> > > >>>
> > > >>
> > > >> I'll continue to monitor the tests. If one fails again, I'll share the
> > > >> logs. As I said, this could take some time, because the failure only
> > occurs
> > > >> about once a week. Thanks for your help so far.
> > > >>
> > > >> Best regards,
> > > >> Emond
> > > >>
> > > >
> >
> >
> > --
> > Bo Peng <pengbo at sraoss.co.jp>
> > SRA OSS, Inc. Japan
> > http://www.sraoss.co.jp/
> >


-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan
http://www.sraoss.co.jp/


More information about the pgpool-general mailing list