[Pgpool-hackers] Pgpool II 2.3.2.2 Patch Contribution

Thu Mar 11 11:35:13 UTC 2010

Thanks! I will look into this. Also any comments are welcome from
other pgpool hackers!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Tatsuo,
> 
> I modified the source file for Pgpool -II 2.3.2.2.
> I attached the patch(diff).
> 
> Regards,
> Rumman
> ________________________________________
> From: Tatsuo Ishii [ishii at sraoss.co.jp]
> Sent: Tuesday, February 23, 2010 9:14 AM
> To: Brian Maguire
> Cc: t-ishii at sra.co.jp; Gazi Rahman; Ahmed Iftekhar; Arthur Vossberg; pgpool-hackers at pgfoundry.org
> Subject: Re: Pgpool II 2.3.1 Patch Contribution
> 
> [Cced to pgpool-hackers]
> 
> Brian,
> 
> Thanks for your contribution. I will look into the code. Also I would
> like to hear from pgpool hackers's opinion if any.
> 
> If possible, it would be great if you could generate patches (diff)
> against CVS HEAD. Also you should change pool_config.l, rather than
> pool_config.c. Another request is, please provide documentation. You
> can modify doc/pgpool-en.html.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> 
> > Tatsuo
> >
> >
> >
> > I would like to contribibution of a patch of a code change to support a config change for the killing of all child processes rather than waiting for disconnect.  The description of the rational and the code chnage are below and attached.  Please let me know if this can be contributed or even better avoided all together some how.
> >
> >
> >
> >
> >
> > Cheers,
> >
> >
> >
> > Brian
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > We changed the source files of Pgpool II 2.3.1 for the following two purposes:
> >     1. To avoid the wait for the disconnection of all sessions at 2nd stage of pgpool recovery.
> >     2. To avoid the disconnection of active sessions at Primary node when the secondary node stop communication with the primary pgpool.
> >
> > 1. To avoid the wait for the disconnection of all sessions at 2nd stage of pgpool recovery
> >
> > With default implementation of Pgpool II 2.3.1, during recovery, at 2nd stage pgpool waits for disconnection of all sessions.  But in 24x7 production database, without stopping the application, it is difficult to ensure that no connection will exist in the database and if there is a session connected in the database, the recovery process will wait for its disconnection and if it does not happen within a specific period then the recovery will fail.
> > To avoid this recovery failure we added a parameter 'kill_all_child_process_before_2nd_stage"
> >
> > if kill_all_child_process_before_2nd_stage = true/1 then
> >     kill all child process before the start of 2nd stage so that no need to wait for disconnections of active sessions
> > else
> >     wait for the disconnections of all active sessions; that is same as the Pgpool II 2.3.1
> > end if
> >
> > Default value for kill_all_child_process_before_2nd_stage is set to false/0.
> >
> > The following 3 files have been modified:
> >
> > pool.h
> > pool_config.c
> > recovery.c
> >
> >
> >
> > File : pool.h   //added the parameter
> >
> >
> >
> > typedef struct
> > {
> > ....
> > int kill_all_child_process_during_before_2nd_stage;
> > } POOL_CONFIG;
> >
> >
> > File: pool_config.c  //read the pgpool.conf value set by the pgpool administrator
> >
> >
> >
> > int pool_init_config()
> > {
> > ...
> > pool_config->kill_all_child_process_before_2nd_stage = 0;
> >
> > } //int pool_init_config()
> >
> >
> > int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context)
> > {
> > ....
> >     //kill_all_child_process_before_2nd_stage
> >     else if (!strcmp(key, "kill_all_child_process_before_2nd_stage") && CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
> >         {
> >             int v = eval_logical(yytext);
> >
> >             if (v < 0)
> >             {
> >                 pool_error("pool_config: invalid value %s for %s", yytext, key);
> >                 return(-1);
> >             }
> >             pool_config->kill_all_child_process_before_2nd_stage = v;
> >     } //kill_all_child_process_before_2nd_stage
> > ....
> > }//int pool_get_config
> >
> >
> > File: recovery.c //added the parameterized functionality in the block "modified here"
> >
> >
> >
> > int start_recovery(int recovery_node)
> > {
> >     ...
> >     pool_log("1st stage is done");
> >
> >     --------------- modified here -------------------------
> >         if (pool_config->kill_all_child_process_before_2nd_stage == 1)
> >                 {       /* kill all children processes */
> >
> >                         pool_log("Disconnection of all sessions before 2nd stage started");
> >                         pool_debug("pool_config =%d",pool_config->num_init_children);
> >                         int i;
> >                         for (i = 0; i < pool_config->num_init_children; i++)
> >                         {
> >                                 pid_t pid = pids[i].pid;
> >                                 if (pid)
> >                                 {
> >                                         kill(pid, SIGQUIT);
> >                                         pool_debug("Disconnect session before 2nd stage: kill %d", pid);
> >                                 }//if (pids[i].pid)
> >                         }//for (i = 0; i < pool_config->num_init_children; i++)
> >                         pool_log("Killed all active sessions");
> >                 }// if (pool_config->kill_all_child_process_before_2nd_stage == 1)
> >
> >     ------------ end of modification ---------------------------------
> >     pool_log("starting 2nd stage");
> >
> >     /* 2nd stage */
> >     *InRecovery = 1;
> >         -------- modified here ----------------
> >         pool_log("---->Waiting for all connection closed <---");
> >         --------- end of modification ----------
> >     if (wait_connection_closed() != 0)
> >     {
> >         PQfinish(conn);
> >         pool_error("start_recovery: timeover for waiting connection closed");
> >         return 1;
> >     }
> >     pool_log("all connections from clients have been closed");
> >
> >     ...
> > } //int start_recovery
> > 2. To avoid the disconnection of active sessions at Primary node when the secondary node stop communication with the primary pgpool
> >
> > With default implementation of Pgpool II 2.3.1, if either node is down pgpool kills all child processes to stop TCP/IP retrying. But this may cause problem in our environment.
> > Let a situation where secondary node somehow disconnected and stopped communication with the primary node. With default implementation, pgpool kills all active calls and other activities of our application.
> > It is not desirable. Activities at primary node must not be hampered if secondary node stops communication.
> > To avoid this, we added a parameter "restart_all_process_for_same_master_node_in_failover".
> >
> > If restart_all_process_for_same_master_node_in_failover = true/1 then
> >    restart all child processes that is same as the default with Pgpool II 2.3.1
> > else
> >    don't restart the child processes for same master node
> > end if
> >
> > Default value for restart_all_process_for_same_master_node_in_failover set to 1/true;
> >
> > Actually, here we have only enabled the old functionaluty of do not restart child process for same master node with the parameter restart_all_process_for_same_master_node_in_failover.
> >
> >
> >
> > The following 3 files have been changed:
> > pool.h
> > pool_config.c
> > main.c
> >
> >
> >
> > File: pool.h    //added the parameter
> >
> >
> >
> > typedef struct
> > {
> > ....
> > int restart_all_process_for_same_master_node_in_failover;
> > } POOL_CONFIG;
> >
> >
> > File: pool_config.c  //read the pgpool.conf value set by the pgpool administrator
> >
> >
> >
> > int pool_init_config()
> > {
> > ...
> > pool_config->restart_all_process_for_same_master_node_in_failover = 1;
> > } //int pool_init_config()
> >
> > int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context)
> > {
> > ....
> > else if (!strcmp(key, "restart_all_process_for_same_master_node_in_failover") &&
> >                  CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
> >         {
> >             int v = eval_logical(yytext);
> >
> >             if (v < 0)
> >             {
> >                 pool_error("pool_config: invalid value %s for %s", yytext, key);
> >                 fclose(fd);
> >                 return(-1);
> >             }
> >             pool_debug("restart_all_process_for_same_master_node_in_failover: %d", v);
> >             pool_config->restart_all_process_for_same_master_node_in_failover = v;
> >         }
> > ....
> >
> > } //int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context)
> >
> >
> >
> >
> > File: main.c // added the parameterized functionality; the block "modified here" indicates our modification
> >
> >
> >
> > ...
> > static void failover(void)
> > {
> >
> >     ...
> >     if (new_master == pool_config->backend_desc->num_backends)
> >     {
> >         pool_error("failover_handler: no valid DB node found");
> >     }
> >
> > /*
> >  * Before we tried to minimize restarting pgpool to protect existing
> >  * connections from clients to pgpool children. What we did here was,
> >  * if children other than master went down, we did not fail over.
> >  * This is wrong. Think about following scenario. If someone
> >  * accidentally plugs out the network cable, the TCP/IP stack keeps
> >  * retrying for long time (typically 2 hours). The only way to stop
> >  * the retry is restarting the process.  Bottom line is, we need to
> >  * restart all children in any case.  See pgpool-general list posting
> >  * "TCP connections are *not* closed when a backend timeout" on Jul 13
> >  * 2008 for more details.
> >  * It has been parameterized as people may want behavior depending on their system.
> >  */
> >
> > ----------- modified here ------------------------
> > //#ifdef NOT_USED //
> >     else
> >     {
> >         if ( pool_config->restart_all_process_for_same_master_node_in_failover == 0 &&   Req_info->master_node_id == new_master && *InRecovery == 0)
> >         {
> >             pool_log("failover_handler: do not restart pgpool. same master node %d was selected", new_master);
> >             if (Req_info->kind == NODE_UP_REQUEST)
> >             {
> >                 pool_log("failback done. reconnect host %s(%d)",
> >                          BACKEND_INFO(node_id).backend_hostname,
> >                          BACKEND_INFO(node_id).backend_port);
> >             }
> >             else
> >             {
> >                 pool_log("failover done. shutdown host %s(%d)",
> >                          BACKEND_INFO(node_id).backend_hostname,
> >                          BACKEND_INFO(node_id).backend_port);
> >             }
> >
> >             /* exec failover_command */
> >             for (i = 0; i < pool_config->backend_desc->num_backends; i++)
> >             {
> >                 if (nodes[i])
> >                     trigger_failover_command(i, pool_config->failover_command);
> >             }
> >
> >             pool_semaphore_unlock(REQUEST_INFO_SEM);
> >             switching = 0;
> >             kill(pcp_pid, SIGUSR2);
> >             switching = 0;
> >             return;
> >         }
> >     }
> > //#endif //
> > ----------- end of modification ------------------------
> >
> >     /* kill all children */
> >     for (i = 0; i < pool_config->num_init_children; i++)
> >     {
> >         pid_t pid = pids[i].pid;
> >         if (pid)
> >         {
> >             kill(pid, SIGQUIT);
> >             pool_debug("failover_handler: kill %d", pid);
> >         }
> >     }
> >
> >     ...
> > } // end of failover()
> > ...