[pgpool-hackers: 4027] Re: invalid degenerate backend request on slave failure

Fri Sep 24 17:50:31 JST 2021

Hey Tatsuo

As I am running a single instance of pgpool, I don’t use watchdog.

No, I don’t use the “-D" option to discard the status file.

Also, please note that I have encountered this issue only twice during all my (several) tests on pgpool.

So, it MIGHT be the case that the issue is not actually with pgpool but with Nomad.

Anyways, I will still be glad to answer any questions you may have to investigate this.

Cheers,

Anirudh
On 24 Sep 2021, 10:35 AM +0200, Tatsuo Ishii <ishii at sraoss.co.jp>, wrote:
> Hi Anirudh,
>
> I have a few questions to look for the cause of the issue.
>
> Do you use watchdog?
>
> Do you use "-D" option (discard the status file) to start pgpool?
>
> > Hello Tatsuo
> >
> > Thanks for the explanation.
> >
> > I am using Pgpool-II 4.2.2.
> >
> > I am still trying to investigate into the issue more and trying to reproduce it by changing different things.
> >
> > It’s strange cz it only occurs sporadically.
> >
> > I will keep you updated if I find more info.
> >
> > Thank you,
> >
> > Anirudh
> > On 23 Sep 2021, 2:36 PM +0200, Tatsuo Ishii <ishii at sraoss.co.jp>, wrote:
> > > Hi,
> > >
> > > > Hello
> > > >
> > > > I have a setup with 3 postgres nodes running behind one pgpool docker container.
> > > >
> > > > The setup works fine when the primary node fails/is shutdown. Failover goes fine in that case.
> > > >
> > > > However, if the standby goes down, the health check fails but instead of performing a failover, pgpool throws this error message-
> > > >
> > > > 2021-09-23 08:37:54: pid 94: LOG: invalid degenerate backend request, node id : 2 status: [2] is not valid for failover
> > > >
> > > > What￠s a bit strange to me is that this only happens when I am running the pgpool container through Nomad.
> > > >
> > > > If I run it directly without Nomad, it still works as expected. Even though this is the only difference that I see between the working and non-working setup, I believe this isn￠t the root cause.
> > > >
> > > > If you can point me towards when a degenerate request is considered invalid, it might help.
> > >
> > > The message says that PostgreSQL backend node id 2 is up and
> > > running. Strange thing is, that's the normal prerequisite to trigger
> > > failover because if the node is already down, there's no point to
> > > trigger failover. Actually before this (raising an error) happens,
> > > Pgpool-II checks a copy of backend status on the process's private
> > > memory, and it seems the status from the private memory is different
> > > from the status (which is in the shared memory) referred to in the
> > > error message. This is hard to understand because the status in the
> > > private memory and the one in the shared memory should be same since
> > > it has been copied when the health check process started.
> > >
> > > The only explanations I can think of are:
> > >
> > > - for some reason Nomad screwed up the private memory of pgpool
> > > (actually the process is health check process).
> > >
> > > - Pgpool-II has new bug and it was revealed accidentally.
> > >
> > > BTW, what version of Pgpool-II are you using? I need more checking on
> > > the source code.
> > > --
> > > Tatsuo Ishii
> > > SRA OSS, Inc. Japan
> > > English: http://www.sraoss.co.jp/index_en.php
> > > Japanese:http://www.sraoss.co.jp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20210924/2da20163/attachment.htm>