[pgpool-general: 7531] Re: Strange behavior on switchover with detach_false_primary enabled

Tatsuo Ishii ishii at sraoss.co.jp
Thu Apr 29 18:46:55 JST 2021

Hi Emond,

> Thanks for the patch. Unfortunately, I'm not in the position to test
> your patch at this moment. Hopefully I can give it a try somewhere
> next week. I did take a look at the code and do have a few questions.
> First of all, you acquire an exclusive lock just before you start the
> follow primary loop. However, this process has at that point already
> forked of the main process. I think this means there is a small window
> in which the main process has already decided it will need to start a
> follow process, but the lock isn't held yet (there is no guarantee
> that a forked process starts to execute before the main process
> continues). During this time window, the main process could already
> enter the detach_false_primary. I'm not exactly sure what the state of
> the cluster is at the moment, but I believe it could already be in the
> state we've seen before. I think the main process should either wait
> for the lock to be acquired by the forked process, or it should take
> the lock itself and pass it on to the forked process (not sure if that
> is possible).

But by the time when the main process forkes off a child process which
will kick the follow primary command, find_primary_node has already
finished. So there's no window here, no?

> Another thing is that the I'm not sure if the change to
> verify_backend_node_status will not cause any other unwanted effects.
> This requires a bit more explanation on why we enabled
> detach_false_primary in the first place. We were seeing some failures
> in our tests in the following scenario:
> * We start with a healthy, 3 node cluster, node 0 is both primary
> database and pgpool leader
> * We terminate node 2 and completely wipe its database directory
> * Node 2 is restarted and will initialize a new, empty database
> * Node 2 rejoins the cluster
> * Pgpool now detects it has 2 primaries  (node 0 and 2) and 1 (node 1)
> standby, but without detach_false_primary, it doesn't act on its own
> * find_primary_node gets the following status information: primary,
> standby, primary and will return the highest backend number, which is
> 2 in this case. This conflicts with the actual state of the system, as
> node 0 was the real primary.
> In our case, this was causing issues in our application. I'm not
> familiar enough with the internals of pgpool itself, but I can image
> this might cause issues in some situations. IMHO pgpool should not
> 'change its mind' on which node is the primary in the middle of a
> failover, as that is probably going to cause a lot of issues. It could
> well be that the outcome of find_primary_node does not really matter
> at this point, but that's hard for me to judge.

I see your point. Actually I was able to reproduce what you said.

# 1. create 3 node streaming replication cluster
# primary, standby, standby
pgpool_setup -n 3

# 2. detach node 2
pcp_detach_node 2

# 2. make node 2 primary
pg_ctl -D data2 promote

# 3. attach node 2
pcp_attach_node 2

At this point, the cluster is: standby, standby, primary, as you said.

In this case find_primary_node really ought to choose node 0 as the
real primary because node 1 is connected standby to node 0, but node
does not have any connected standby. I will fix this.

In the mean time I am not really sure Pgpool-II should not 'change its
mind' on which node is the primary in the middle of a
failover. Because the former primary is not always necessarily a
correct primary. For example, if we start with #3 above and
detach/attach node 0, probably we want node 0 be new primary.

Probably new rule would be:

Pgpool-II should not 'change its mind' on which node is the primary in
the middle of a failover if there's no reliable way to judge which is
the correct primary (for example, there are multiple primaries but no

What do you think?

Best regards,
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php

More information about the pgpool-general mailing list