[pgpool-hackers: 2326] Re: Corner case bug in primary failover

Tatsuo Ishii ishii at sraoss.co.jp
Wed May 10 08:39:41 JST 2017


> On Tue, May 9, 2017 at 5:53 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> > On Tue, May 9, 2017 at 6:16 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> >
>> >> Usama,
>> >>
>> >> While adding a new test case to 003.failover regression test, I found
>> >> a corner case bug in primary failover.
>> >>
>> >> Suppose Pgpool-II starts but is yet finding primary node. If primary
>> >> failover happens, it skips finding primary node and let the initial
>> >> value of it (Req_info->primary_node_id == -1) to be used as the new
>> >> primary node id. As a result, no primary node id exists until next
>> >> failover happens.
>> >>
>> >> Initialy I thought The problem is in the code of
>> >> pgpool_main.c:failover() which tries to optimize finding primary node
>> >> process.
>> >>
>> >>                 /*
>> >>                  * If the down node was a standby node in streaming
>> >> replication
>> >>                  * mode, we can avoid calling
>> >> find_primary_node_repeatedly() and
>> >>                  * recognize the former primary as the new primary node,
>> >> which
>> >>                  * will reduce the time to process standby down.
>> >>                  */
>> >>                 else if (MASTER_SLAVE && pool_config->master_slave_sub_
>> mode
>> >> == STREAM_MODE &&
>> >>                                  reqkind == NODE_DOWN_REQUEST)
>> >>                 {
>> >>                         if (Req_info->primary_node_id != node_id)
>> >>                                 new_primary = Req_info->primary_node_id;
>> >>                         else
>> >>                                 new_primary =
>> >> find_primary_node_repeatedly();
>> >>
>> >> I was attempting to fix it by checking Req_info->primary_node_id to
>> >> see if it's initial value (-1) or not. If it's -1,
>> >> find_primary_node_repeatedly() need to be called.
>> >>
>> >> But looking into pgpool_main() closely, I suspect there's a
>> >> fundamental problem:
>> >>
>> >> 1) It processes failover in CHECK_REQUEST *before* setting
>> >>    Req_info->primary_node_id.
>> >>
>> >>         /*
>> >>          * check for child signals to ensure child startup before
>> >> reporting successfull start
>> >>          */
>> >>         CHECK_REQUEST;
>> >>
>> >>         ereport(LOG,
>> >>                         (errmsg("%s successfully started. version %s
>> >> (%s)", PACKAGE, VERSION, PGPOOLVERSION)));
>> >>
>> >>         /*
>> >>          * if the primary node id is not loaded by watchdog, search for
>> it
>> >>          */
>> >>         if (Req_info->primary_node_id < 0)
>> >>         {
>> >>                 /* Save primary node id */
>> >>                 Req_info->primary_node_id = find_primary_node();
>> >>         }
>> >>
>> >> 2) It uses find_primary_node(), rather than
>> >>    find_primary_node_repeatedly(). So if by some reasons (for example
>> >>    the backend does not come up yet), find_primary_node() will fail
>> >>    and Req_info->primary_node_id is set to -1.
>> >>
>> >> I think proper fix will be moving the CHECK_REQUEST call above inside
>> >> main loop, and change the find_primary_node() call to
>> >> find_primary_node_repeatedly().
>> >>
>> >> Attached is the patch to do that (plus change the
>> >> search_primary_node_timeout to smaller value in 055.backend_all_down
>> >> test. Otherwise, regression timeout is triggered) against master
>> >> branch.
>> >>
>> >> What do you think?
>> >>
>> >
>> >
>> > Waoo thanks for catching this, it is a really annoying issue, I think
>> your
>> > patch does solve the problem and is the right approach,
>> > But I was thinking what if we move search for the primary node before
>> > starting the child processes. So that we spawn the child processes after
>> > finishing all the
>> > startup rituals?
>>
>> Oh I think that will make even things safer.
>>
>> > Do you think it will cause some issues?
>>
>> So far there's nothing I can think of. Let me try it. I will report
>> back tomorrow.
>>
> 
> Many thanks

It seems your proposed change works well. So I pushed it to the master
branch. However, I have not back patched for the moment because
tomorrow Pen is going to release minor versions and I do not want to
risk to make a regression in the release.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list