[pgpool-hackers: 1536] Re: Allow to access to pgpool while doing health checking

Wed May 4 16:27:46 JST 2016

On Wed, May 4, 2016 at 5:53 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> Usama,
>
> Thank you for testing the patch. Shall I commit/push it after you fix
> the regression test issue? Or shall I commit/push it now? Either is
> fine for me.
>

I have already pushed the fix for regression failure, and was waiting for
the buildfarm results for the confirmation.  Todays results verified the
fix, and you can go ahead with committing the patch.

Kind regards
Muhammad Usama

>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Hi Ishii-San
> >
> > I have tested the patch, It successfully takes care of the very annoying
> > problem and it is working as expected.
> >
> > Best regards
> > Muhammad Usama
> >
> >
> > On Tue, May 3, 2016 at 5:25 PM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >
> >> Currently any attempt to connect to pgpool fails if pgpool is doing
> >> health check against failed node even if fail_over_on_backend_error is
> >> off because pgpool child first tries to connect to all backend
> >> including the failed one and exits if it fails to connect to a backend
> >> (of course it fails). This is a temporary situation and will be
> >> resolved before pgpool executes failover. However if the health check
> >> is retrying, the temporary situation keeps longer depending on the
> >> setting of health_check_max_retries and health_check_retry_delay. This
> >> is not good. Attached patch tries to mitigate the problem:
> >>
> >> - When an attempt to connect to backend fails, give up connecting to
> >>   the failed node and skip to other node, rather than exiting the
> >>   process if operating in streaming replication mode and the node is
> >>   not primary node.
> >>
> >> - Mark the local status of the failed node to "down".
> >>
> >> - This will let the primary node be selected as a load balance node
> >>   and every queries will be sent to the primary node. If there's other
> >>   healthy standby nodes, one of them will be chosen as the load
> >>   balance node.
> >>
> >> - After the session is over, the child process will suicide to not
> >>   retain the local status.
> >>
> >> Comments?
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> diff --git a/src/include/pool.h b/src/include/pool.h
> >> index 4c6e82f..1f43efd 100644
> >> --- a/src/include/pool.h
> >> +++ b/src/include/pool.h
> >> @@ -323,6 +323,7 @@ extern int my_master_node_id;
> >>   */
> >>  #define PRIMARY_NODE_ID (Req_info->primary_node_id >=0?\
> >>
> >>  Req_info->primary_node_id:REAL_MASTER_NODE_ID)
> >> +#define IS_PRIMARY_NODE_ID(node_id)    (node_id == PRIMARY_NODE_ID)
> >>
> >>  /*
> >>   * Real primary node id. If not in the mode or there's no primary
> >> diff --git a/src/protocol/pool_connection_pool.c
> >> b/src/protocol/pool_connection_pool.c
> >> index b7cc946..7c33366 100644
> >> --- a/src/protocol/pool_connection_pool.c
> >> +++ b/src/protocol/pool_connection_pool.c
> >> @@ -812,8 +812,8 @@ static POOL_CONNECTION_POOL_SLOT
> >> *create_cp(POOL_CONNECTION_POOL_SLOT *cp, int s
> >>  }
> >>
> >>  /*
> >> - * create actual connections to backends
> >> - * new connection resides in TopMemoryContext
> >> + * Create actual connections to backends.
> >> + * New connection resides in TopMemoryContext.
> >>   */
> >>  static POOL_CONNECTION_POOL *new_connection(POOL_CONNECTION_POOL *p)
> >>  {
> >> @@ -851,12 +851,34 @@ static POOL_CONNECTION_POOL
> >> *new_connection(POOL_CONNECTION_POOL *p)
> >>                                 ereport(FATAL,
> >>                                         (errmsg("failed to create a
> >> backend connection"),
> >>                                                  errdetail("executing
> >> failover on backend")));
> >> -                       }
> >> +                       }
> >>                         else
> >>                         {
> >> -                               ereport(FATAL,
> >> -                                       (errmsg("failed to create a
> >> backend connection"),
> >> -                                                errdetail("not
> executing
> >> failover because fail_over_on_backend_error is off")));
> >> +                               /*
> >> +                                * If we are in streaming replication
> mode
> >> and the node is a
> >> +                                * standby node, then we skip this node
> to
> >> avoid fail over.
> >> +                                */
> >> +                               if (STREAM && !IS_PRIMARY_NODE_ID(i))
> >> +                               {
> >> +                                       ereport(LOG,
> >> +                                                       (errmsg("failed
> to
> >> create a backend %d connection", i),
> >> +                                                        errdetail("skip
> >> this backend because because fail_over_on_backend_error is off and we
> are
> >> in streaming replication mode and node is standby node")));
> >> +
> >> +                                       /* set down status to local
> status
> >> area */
> >> +                                       *(my_backend_status[i]) =
> CON_DOWN;
> >> +
> >> +                                       /* make sure that we need to
> >> restart the process after
> >> +                                        * finishing this session
> >> +                                        */
> >> +
> >>  pool_get_my_process_info()->need_to_restart = 1;
> >> +                                       continue;
> >> +                               }
> >> +                               else
> >> +                               {
> >> +                                       ereport(FATAL,
> >> +                                                       (errmsg("failed
> to
> >> create a backend %d connection", i),
> >> +                                                        errdetail("not
> >> executing failover because fail_over_on_backend_error is off")));
> >> +                               }
> >>                         }
> >>                         child_exit(POOL_EXIT_AND_RESTART);
> >>                 }
> >> diff --git a/src/utils/pool_process_reporting.c
> >> b/src/utils/pool_process_reporting.c
> >> index 9b190c7..6cfd860 100644
> >> --- a/src/utils/pool_process_reporting.c
> >> +++ b/src/utils/pool_process_reporting.c
> >> @@ -5,7 +5,7 @@
> >>   * pgpool: a language independent connection pool server for PostgreSQL
> >>   * written by Tatsuo Ishii
> >>   *
> >> - * Copyright (c) 2003-2015     PgPool Global Development Group
> >> + * Copyright (c) 2003-2016     PgPool Global Development Group
> >>   *
> >>   * Permission to use, copy, modify, and distribute this software and
> >>   * its documentation for any purpose and without fee is hereby
> >>
> >> _______________________________________________
> >> pgpool-hackers mailing list
> >> pgpool-hackers at pgpool.net
> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20160504/d8b0a267/attachment-0001.html>