[pgpool-general: 7989] Re: pcp_node_info does not return when host is lost on 4.3.0

Tue Jan 25 01:12:14 JST 2022

Hi,

I've managed to do some bisecting today, and I can say with quite high
confidence that the issue is introduced with these 2 commits:
1ae1f159b89f4d18a8f7b737929e9a6448ad63ab Add new fields to show pool_nodes
command and friends.
6de0d264be66ce145d3ed726235920401cf74ebe Fix pcp_node_info failure when
backend is down.

When running the tests with the first commit, pgpool fails to start at that
point in the test. I suspect the second commit fixes that, but since that
point, all builds got stuck on the pcp_node_info call. In between there's
another commit that updates some docs
(8e8ecaced44ec9e6322023729c427bcfa732deda), but that should not be relevant
to this issue. I hope this info helps you pinpoint the exact problem.

Best regards,
Emond

On Mon, Jan 24, 2022 at 8:23 AM Emond Papegaaij <emond.papegaaij at gmail.com>
wrote:

> Hi,
>
> Unfortunately, the patch doesn't help. The call to pcp_node_info still
> hangs. I do however see a difference in the pgpool log. The pcp worker only
> logs a single line:
>
> 2022-01-24 05:26:37: pid 81: LOG:  forked new pcp worker, pid=211 socket=7
> 2022-01-24 05:26:47: pid 211: LOG:  failed to connect to PostgreSQL server
> on "172.29.30.2:5432", timed out
>
> After this, there's no mention of pid 211. No log messages from that pid,
> but also not from pid 81 (which I would expect to log the PCP process to
> exit).
>
> Best regards,
> Emond
>
> On Sat, Jan 22, 2022 at 2:15 PM Bo Peng <pengbo at sraoss.co.jp> wrote:
>
>> Hello,
>>
>> Thank you for your reply.
>>
>> I think it is a particular issue of 4.3.0.
>> Another developer, Tatsuo Ishii, has created a patch that fixes this
>> issue.
>> Could you check the attached patch if you can apply this patch?
>>
>> Best regards,
>>
>> On Fri, 21 Jan 2022 14:10:05 +0100
>> Emond Papegaaij <emond.papegaaij at gmail.com> wrote:
>>
>> > >
>> > > > We are working on the upgrade from 4.2.6 to 4.3.0 and we are facing
>> a
>> > >> test
>> > >> > that is failing consistently. In one of our tests we powerdown 2
>> of the
>> > >> 3
>> > >> > hosts with a hard poweroff. Prior to the poweroff, we configure the
>> > >> cluster
>> > >>
>> > >> Thank you for reporting this issue.
>> > >> I am going to look into it.
>> > >> Does this issue only occur in 4.3.0?
>> > >
>> > >
>> > > Thanks for looking into this. As often is the case with these kinds of
>> > > errors, I cannot be absolutely sure, but I haven't seen this error
>> before
>> > > with 4.2.6 or earlier. We skipped 4.2.7, as the release notes state
>> it was
>> > > only for PG14 support, which we don't need at the moment.
>> > >
>> > > To report back on this. We've ran 11 consecutive builds with 4.3.0,
>> all
>> > failing on this issue. I've check the past 40 or so build with 4.2.6 and
>> > none of them failed. So this is definitely a regression in 4.3.0. Do you
>> > already have an idea on the cause of this? If not, I can try to perform
>> a
>> > bisect on the diff between 4.2.6 and 4.3.0. This will however take me
>> some
>> > time, as every build takes about 2 hours. Git expects about 8 revisions
>> to
>> > check, so that's 2 whole working days.
>> >
>> > Best regards,
>> > Emond
>>
>>
>> --
>> Bo Peng <pengbo at sraoss.co.jp>
>> SRA OSS, Inc. Japan
>> http://www.sraoss.co.jp/
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220124/3dd55f0d/attachment.htm>