[pgpool-hackers: 3553] Build farm failure starting on 2020/3/15 in master branch

Tatsuo Ishii ishii at sraoss.co.jp
Mon Mar 16 10:42:22 JST 2020


We have experienced massive build farm failure starting on 2020/3/15
in master branch.

From: buildfarm at pgpool.net
Subject: [pgpool-buildfarm: 554] pgpool-II buildfarm results CentOS8
Date: Sun, 15 Mar 2020 11:50:21 +0900
Message-ID: <5e6d97ed.+sRtHRaZxHljR39S%buildfarm at pgpool.net>

> =========================================================================
> * master  PostgreSQL 11  CentOS8
> testing 001.load_balance...failed.
> testing 003.failover...failed.
[snip]

I have looked into this and confirmed that the cause is this commit:

----------------------------------------------------------------
Subject: [pgpool-committers: 6625] pgpool: Add support for SSL CRL (Certificate Revocation List).
From: Tatsuo Ishii <ishii at sraoss.co.jp>
To: pgpool-committers at pgpool.net
Date: Sat, 14 Mar 2020 03:21:16 +0000
Sender: pgpool-committers-bounces at pgpool.net
X-Mew: tab/spc characters on Subject: are simplified.

Add support for SSL CRL (Certificate Revocation List).
----------------------------------------------------------------

Also I have found that actual cause of the failure is not this
commit. In fact any attempt to add new configuration parameter could
cause the failure. The commit just hit a hidden bug.

I am going to explain why the build farm failure happened.

1) config process (src/config/pool_config_variables.c) sorts each
config parameters based on its string name length (see
sort_config_vars()).

2) A new parameter added by the commit.

3) Since the config parameter "backend_flag*" has the same string
length (12) as the new parameter "ssl_crl_file", the order of
processing of ALLOW_TO_FAILOVER and ALWAYS_MASTER has been changed
since the commit. before: ALWAYS_MASTER, ALLOW_TO_FAILOVER, now:
ALLOW_TO_FAILOVER, ALWAYS_MASTER.

4) the built-in default values for backend_flag is ALLOW_TO_FAILOVER
and ALWAYS_MASTER.

5) Before the commit, the order of processing the backend_flag was
ALWAYS_MASTER, then replaced by ALLOW_TO_FAILOVER.

6) After the commit, ALLOW_TO_FAILOVER, then replaced by
ALWAYS_MASTER. So the result of the flag is now ALWAYS_MASTER.

7) Since both backend 0 and backend 1 now has ALWAYS_MASTER flag,
pgpool is confused and mistakenly sets backend 1 as primary. So
DDL/DML are sent to backend 1 and failed. This makes almost all
regression tests failed (only surviving regression tests use only 1
backend and are not affected by the problem).

So what should we do?

I think current implementation of pgpool configure processing of
backend flags (src/config/pool_config_variable.c) has multiple issues.

1) The default value for backend flag is ALWAYS_MASTER. This should be
"" (empty string).

2) The final value of the backend flag is the last default for the
flag. This is plain wrong (see BackendFlagsAssignFunc()). The result
value should be OR'ed value of each default value since backend_flag
is a bit data.

I am going to fix the issue as soon as possible.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list