[pgpool-hackers: 4026] Re: invalid degenerate backend request on slave failure

Fri Sep 24 17:34:58 JST 2021

Hi Anirudh,

I have a few questions to look for the cause of the issue.

Do you use watchdog?

Do you use "-D" option (discard the status file) to start pgpool?

> Hello Tatsuo
> 
> Thanks for the explanation.
> 
> I am using Pgpool-II 4.2.2.
> 
> I am still trying to investigate into the issue more and trying to reproduce it by changing different things.
> 
> It’s strange cz it only occurs sporadically.
> 
> I will keep you updated if I find more info.
> 
> Thank you,
> 
> Anirudh
> On 23 Sep 2021, 2:36 PM +0200, Tatsuo Ishii <ishii at sraoss.co.jp>, wrote:
>> Hi,
>>
>> > Hello
>> >
>> > I have a setup with 3 postgres nodes running behind one pgpool docker container.
>> >
>> > The setup works fine when the primary node fails/is shutdown. Failover goes fine in that case.
>> >
>> > However, if the standby goes down, the health check fails but instead of performing a failover, pgpool throws this error message-
>> >
>> > 2021-09-23 08:37:54: pid 94: LOG: invalid degenerate backend request, node id : 2 status: [2] is not valid for failover
>> >
>> > What¢s a bit strange to me is that this only happens when I am running the pgpool container through Nomad.
>> >
>> > If I run it directly without Nomad, it still works as expected. Even though this is the only difference that I see between the working and non-working setup, I believe this isn¢t the root cause.
>> >
>> > If you can point me towards when a degenerate request is considered invalid, it might help.
>>
>> The message says that PostgreSQL backend node id 2 is up and
>> running. Strange thing is, that's the normal prerequisite to trigger
>> failover because if the node is already down, there's no point to
>> trigger failover. Actually before this (raising an error) happens,
>> Pgpool-II checks a copy of backend status on the process's private
>> memory, and it seems the status from the private memory is different
>> from the status (which is in the shared memory) referred to in the
>> error message. This is hard to understand because the status in the
>> private memory and the one in the shared memory should be same since
>> it has been copied when the health check process started.
>>
>> The only explanations I can think of are:
>>
>> - for some reason Nomad screwed up the private memory of pgpool
>> (actually the process is health check process).
>>
>> - Pgpool-II has new bug and it was revealed accidentally.
>>
>> BTW, what version of Pgpool-II are you using? I need more checking on
>> the source code.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp