[pgpool-committers: 1142] pgpool: Fix bug with health check when used with child_life_time report

Tatsuo Ishii ishii at postgresql.org
Sat Jul 20 13:26:56 JST 2013


Fix bug with health check when used with child_life_time reported in [pgpool-general: 1892].

Here is the explanation why the problem occurs:

--------------------------------------------------------------------------------
Ok. I think I finally understand what's going on here.

Pgpool main process (14317) started health checking at Jul 12 09:17:04.

Jul 12 09:17:04 purple1-node1-ps pgpool[14317]: starting health checking

Pgpool main process set timer at 09:17:14 because you set
health_check_timeout 10.  This time the health check successfully
completed. The timer for 09:17:14 is blocked by calling
signal(SIGALRM, SIG_IGN).

Unfortunately child life time was expired at 09:17:14 and pgpool main
process was busy at the time because of this.

Jul 12 09:17:14 purple1-node1-ps pgpool[16789]: child life 300 seconds expired
Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: reap_handler called

Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: starting health checking

Pgpool main re-enabled the timer and reset the timer variable
(health_check_timer_expired = 0). But when the timer re-enabled, the
signal handler for the timer set health_check_timer_expired to 1.  As
a result pgpool thought that health check timer was expired.

Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: health_check: health check timer has been already expired before attempting to connect to 0 th backend

Thus failover happend even if the backend was running fine.
--------------------------------------------------------------------------------

To fix the problem new macro CLEAR_ALARM, which calls alarm(0) until
all pending alarms are cleared, is defined and used whenever necessary
to cancel health check timer. Also before forking off child process
health_check_timer_expire is explicitely cleared.

Also this causes the error message.

Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connect_inet_domain_socket_by_port: health check timer expired

Process 11465 is a child process and is not supposed to run into this
situation. This is caused because the global variable
"health_check_timer_expired" is set to 1 before the new child is
forked off after child_life_time expired is set to 1. This could if
SIGCHLD signal is received at the moment when the bug below happens.
To make sure this never happens in connect_inet_domain_socket_by_port
checks health_check_timer_expired only if it is a main process.

Branch
------
V3_2_STABLE

Details
-------
http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=7572d08fcc59c3c16fe93b373660671d2f86c41d

Modified Files
--------------
main.c                 |   11 +++++++++++
pool_connection_pool.c |    2 +-
2 files changed, 12 insertions(+), 1 deletion(-)



More information about the pgpool-committers mailing list