[pgpool-hackers: 2324] Re: Corner case bug in primary failover

Tue May 9 21:53:38 JST 2017

> On Tue, May 9, 2017 at 6:16 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Usama,
>>
>> While adding a new test case to 003.failover regression test, I found
>> a corner case bug in primary failover.
>>
>> Suppose Pgpool-II starts but is yet finding primary node. If primary
>> failover happens, it skips finding primary node and let the initial
>> value of it (Req_info->primary_node_id == -1) to be used as the new
>> primary node id. As a result, no primary node id exists until next
>> failover happens.
>>
>> Initialy I thought The problem is in the code of
>> pgpool_main.c:failover() which tries to optimize finding primary node
>> process.
>>
>>                 /*
>>                  * If the down node was a standby node in streaming
>> replication
>>                  * mode, we can avoid calling
>> find_primary_node_repeatedly() and
>>                  * recognize the former primary as the new primary node,
>> which
>>                  * will reduce the time to process standby down.
>>                  */
>>                 else if (MASTER_SLAVE && pool_config->master_slave_sub_mode
>> == STREAM_MODE &&
>>                                  reqkind == NODE_DOWN_REQUEST)
>>                 {
>>                         if (Req_info->primary_node_id != node_id)
>>                                 new_primary = Req_info->primary_node_id;
>>                         else
>>                                 new_primary =
>> find_primary_node_repeatedly();
>>
>> I was attempting to fix it by checking Req_info->primary_node_id to
>> see if it's initial value (-1) or not. If it's -1,
>> find_primary_node_repeatedly() need to be called.
>>
>> But looking into pgpool_main() closely, I suspect there's a
>> fundamental problem:
>>
>> 1) It processes failover in CHECK_REQUEST *before* setting
>>    Req_info->primary_node_id.
>>
>>         /*
>>          * check for child signals to ensure child startup before
>> reporting successfull start
>>          */
>>         CHECK_REQUEST;
>>
>>         ereport(LOG,
>>                         (errmsg("%s successfully started. version %s
>> (%s)", PACKAGE, VERSION, PGPOOLVERSION)));
>>
>>         /*
>>          * if the primary node id is not loaded by watchdog, search for it
>>          */
>>         if (Req_info->primary_node_id < 0)
>>         {
>>                 /* Save primary node id */
>>                 Req_info->primary_node_id = find_primary_node();
>>         }
>>
>> 2) It uses find_primary_node(), rather than
>>    find_primary_node_repeatedly(). So if by some reasons (for example
>>    the backend does not come up yet), find_primary_node() will fail
>>    and Req_info->primary_node_id is set to -1.
>>
>> I think proper fix will be moving the CHECK_REQUEST call above inside
>> main loop, and change the find_primary_node() call to
>> find_primary_node_repeatedly().
>>
>> Attached is the patch to do that (plus change the
>> search_primary_node_timeout to smaller value in 055.backend_all_down
>> test. Otherwise, regression timeout is triggered) against master
>> branch.
>>
>> What do you think?
>>
> 
> 
> Waoo thanks for catching this, it is a really annoying issue, I think your
> patch does solve the problem and is the right approach,
> But I was thinking what if we move search for the primary node before
> starting the child processes. So that we spawn the child processes after
> finishing all the
> startup rituals?

Oh I think that will make even things safer.

> Do you think it will cause some issues?

So far there's nothing I can think of. Let me try it. I will report
back tomorrow.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp