[pgpool-general: 7531] Re: Strange behavior on switchover with detach_false_primary enabled

Tatsuo Ishii ishii at sraoss.co.jp
Thu Apr 29 18:46:55 JST 2021


Hi Emond,

> Thanks for the patch. Unfortunately, I'm not in the position to test
> your patch at this moment. Hopefully I can give it a try somewhere
> next week. I did take a look at the code and do have a few questions.
> 
> First of all, you acquire an exclusive lock just before you start the
> follow primary loop. However, this process has at that point already
> forked of the main process. I think this means there is a small window
> in which the main process has already decided it will need to start a
> follow process, but the lock isn't held yet (there is no guarantee
> that a forked process starts to execute before the main process
> continues). During this time window, the main process could already
> enter the detach_false_primary. I'm not exactly sure what the state of
> the cluster is at the moment, but I believe it could already be in the
> state we've seen before. I think the main process should either wait
> for the lock to be acquired by the forked process, or it should take
> the lock itself and pass it on to the forked process (not sure if that
> is possible).

But by the time when the main process forkes off a child process which
will kick the follow primary command, find_primary_node has already
finished. So there's no window here, no?

> Another thing is that the I'm not sure if the change to
> verify_backend_node_status will not cause any other unwanted effects.
> This requires a bit more explanation on why we enabled
> detach_false_primary in the first place. We were seeing some failures
> in our tests in the following scenario:
> * We start with a healthy, 3 node cluster, node 0 is both primary
> database and pgpool leader
> * We terminate node 2 and completely wipe its database directory
> * Node 2 is restarted and will initialize a new, empty database
> * Node 2 rejoins the cluster
> * Pgpool now detects it has 2 primaries  (node 0 and 2) and 1 (node 1)
> standby, but without detach_false_primary, it doesn't act on its own
> * find_primary_node gets the following status information: primary,
> standby, primary and will return the highest backend number, which is
> 2 in this case. This conflicts with the actual state of the system, as
> node 0 was the real primary.
> In our case, this was causing issues in our application. I'm not
> familiar enough with the internals of pgpool itself, but I can image
> this might cause issues in some situations. IMHO pgpool should not
> 'change its mind' on which node is the primary in the middle of a
> failover, as that is probably going to cause a lot of issues. It could
> well be that the outcome of find_primary_node does not really matter
> at this point, but that's hard for me to judge.

I see your point. Actually I was able to reproduce what you said.

# 1. create 3 node streaming replication cluster
# primary, standby, standby
pgpool_setup -n 3

# 2. detach node 2
pcp_detach_node 2

# 2. make node 2 primary
pg_ctl -D data2 promote

# 3. attach node 2
pcp_attach_node 2

At this point, the cluster is: standby, standby, primary, as you said.

In this case find_primary_node really ought to choose node 0 as the
real primary because node 1 is connected standby to node 0, but node
does not have any connected standby. I will fix this.

In the mean time I am not really sure Pgpool-II should not 'change its
mind' on which node is the primary in the middle of a
failover. Because the former primary is not always necessarily a
correct primary. For example, if we start with #3 above and
detach/attach node 0, probably we want node 0 be new primary.

Probably new rule would be:

Pgpool-II should not 'change its mind' on which node is the primary in
the middle of a failover if there's no reliable way to judge which is
the correct primary (for example, there are multiple primaries but no
standby).

What do you think?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-general mailing list