View Issue Details

IDProjectCategoryView StatusLast Update
0000371Pgpool-IIBugpublic2017-12-19 10:08
Reporterm.oyamataAssigned Tot-ishii 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionopen 
Product Version3.7.0 
Target Version3.7.1Fixed in Version 
Summary0000371: specify per node health check parameters setting in pgpool-II 3.7.0 does not work
DescriptionThe health check related parameters set for each node are not reflected in the health check operation.
Steps To ReproduceEnvironment information

PostgreSQL Version: 10.0
PostgreSQL node : 2(Streaming Replication)
pgpool-II Version: 3.7.0
pgpool-II mode : master slave

PostgreSQL and pgpool-II are all running on the same server.(Red Hat Enterprise Linux Server release 7.3 (Maipo))


1. Set health_check_max_retries of node 1 to 100
postgres=# pgpool show health_check;
           item | value | description
---------------------------+----------+------------------------------------------------------------------------------------------------------
 health_check_period | 1 | Time interval in seconds between the health checks.
 health_check_timeout | 20 | Backend node health check timeout value in seconds.
 health_check_user | postgres | User name for PostgreSQL backend health check.
 health_check_password | ***** | Password for PostgreSQL backend health check database user.
 health_check_database | postgres | The database name to be used to perform PostgreSQL backend health check.
 health_check_max_retries | 5 | The maximum number of times to retry a failed health check before giving up and initiating failover.
 health_check_retry_delay | 1 | The amount of time in seconds to wait between failed health check retries.
 connect_timeout | 10000 | Timeout in milliseconds before giving up connecting to backend.
 health_check_period0 | 1 | Time interval in seconds between the health checks.
 health_check_timeout0 | 20 | Backend node health check timeout value in seconds.
 health_check_user0 | postgres | User name for PostgreSQL backend health check.
 health_check_password0 | ***** | Password for PostgreSQL backend health check database user.
 health_check_database0 | postgres | The database name to be used to perform PostgreSQL backend health check.
 health_check_max_retries0 | 5 | The maximum number of times to retry a failed health check before giving up and initiating failover.
 health_check_retry_delay0 | 1 | The amount of time in seconds to wait between failed health check retries.
 connect_timeout0 | 10000 | Timeout in milliseconds before giving up connecting to backend.
 health_check_period1 | 1 | Time interval in seconds between the health checks.
 health_check_timeout1 | 20 | Backend node health check timeout value in seconds.
 health_check_user1 | postgres | User name for PostgreSQL backend health check.
 health_check_password1 | ***** | Password for PostgreSQL backend health check database user.
 health_check_database1 | postgres | The database name to be used to perform PostgreSQL backend health check.
 health_check_max_retries1 | 100 | The maximum number of times to retry a failed health check before giving up and initiating failover.
 health_check_retry_delay1 | 1 | The amount of time in seconds to wait between failed health check retries.
 connect_timeout1 | 10000 | Timeout in milliseconds before giving up connecting to backend.
(24 rows)

2. Stop node 1
$ pg_ctl stop -m i -D data2
waiting for server to shut down.... done
server stopped

3. pgpool-II logfile
Health check to node 1 is executed only 5 times.

2017-12-17 23:11:35: pid 9266: LOG: pgpool-II successfully started. version 3.7.0 (amefuriboshi)
2017-12-17 23:12:30: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:30: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:30: pid 9303: ERROR: failed to make persistent db connection
2017-12-17 23:12:30: pid 9303: DETAIL: connection to host:"localhost:5411" failed
2017-12-17 23:12:30: pid 9303: LOG: health check retrying on DB node: 1 (round:1)
2017-12-17 23:12:31: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:31: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:31: pid 9303: ERROR: failed to make persistent db connection
2017-12-17 23:12:31: pid 9303: DETAIL: connection to host:"localhost:5411" failed
2017-12-17 23:12:31: pid 9303: LOG: health check retrying on DB node: 1 (round:2)
2017-12-17 23:12:32: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:32: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:32: pid 9303: ERROR: failed to make persistent db connection
2017-12-17 23:12:32: pid 9303: DETAIL: connection to host:"localhost:5411" failed
2017-12-17 23:12:32: pid 9303: LOG: health check retrying on DB node: 1 (round:3)
2017-12-17 23:12:33: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:33: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:33: pid 9303: ERROR: failed to make persistent db connection
2017-12-17 23:12:33: pid 9303: DETAIL: connection to host:"localhost:5411" failed
2017-12-17 23:12:33: pid 9303: LOG: health check retrying on DB node: 1 (round:4)
2017-12-17 23:12:34: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:34: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:34: pid 9303: ERROR: failed to make persistent db connection
2017-12-17 23:12:34: pid 9303: DETAIL: connection to host:"localhost:5411" failed
2017-12-17 23:12:34: pid 9303: LOG: health check retrying on DB node: 1 (round:5)
2017-12-17 23:12:35: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:35: pid 9303: LOG: failed to connect to PostgreSQL server on "localhost:5411", getsockopt() detected error "Connection refused"
2017-12-17 23:12:35: pid 9303: ERROR: failed to make persistent db connection
2017-12-17 23:12:35: pid 9303: DETAIL: connection to host:"localhost:5411" failed
2017-12-17 23:12:35: pid 9303: LOG: health check failed on node 1 (timeout:0)
2017-12-17 23:12:35: pid 9303: LOG: received degenerate backend request for node_id: 1 from pid [9303]
2017-12-17 23:12:35: pid 9266: LOG: Pgpool-II parent process has received failover request
2017-12-17 23:12:35: pid 9266: LOG: starting degeneration. shutdown host localhost(5411)
2017-12-17 23:12:35: pid 9266: LOG: Do not restart children because we are switching over node id 1 host: localhost port: 5411 and we are in streaming replication mode
2017-12-17 23:12:35: pid 9266: LOG: execute command: echo test
test

TagsNo tags attached.

Activities

m.oyamata

2017-12-18 09:41

reporter  

pgpool.conf (35,636 bytes)
pgpool_3.7.log (5,095 bytes)

t-ishii

2017-12-19 08:36

developer   ~0001870

It seems Pgpool-II 3.7's per node health check parameter is broken. Can you please try attached patch?

health_check.diff (4,628 bytes)
diff --git a/src/main/health_check.c b/src/main/health_check.c
index fcdb59b..de3eda8 100644
--- a/src/main/health_check.c
+++ b/src/main/health_check.c
@@ -165,7 +165,7 @@ void do_health_check_child(int *node_id)
 
 		CHECK_REQUEST;
 
-		if (pool_config->health_check_period <= 0)
+		if (pool_config->health_check_params[*node_id].health_check_period <= 0)
 		{
 			sleep(30);
 		}
@@ -174,7 +174,7 @@ void do_health_check_child(int *node_id)
 		 * If health checking is enabled and the node is not in down status,
 		 * do health check.
 		 */
-		else if (pool_config->health_check_period > 0)
+		else if (pool_config->health_check_params[*node_id].health_check_period > 0)
 		{
 			bool result;
 
@@ -207,7 +207,7 @@ void do_health_check_child(int *node_id)
 
 			/* Discard persistent connections */
 			discard_persistent_connection(*node_id);
-			sleep(pool_config->health_check_period);
+			sleep(pool_config->health_check_params[*node_id].health_check_period);
 		}
 	}
 	exit(0);
@@ -235,15 +235,15 @@ static bool establish_persistent_connection(int node)
 	/*
 	 * If database is not specified, "postgres" database is assumed.
 	 */
-	if (*pool_config->health_check_database == '\0')
-		pool_config->health_check_database = "postgres";
+	if (*pool_config->health_check_params[node].health_check_database == '\0')
+		pool_config->health_check_params[node].health_check_database = "postgres";
 
 	/*
 	 * Try to connect to the database.
 	 */
 	if (slot == NULL)
 	{
-		retry_cnt = pool_config->health_check_max_retries;
+		retry_cnt = pool_config->health_check_params[node].health_check_max_retries;
 
 		do
 		{
@@ -252,22 +252,22 @@ static bool establish_persistent_connection(int node)
 			 * communication path failure much earlier before
 			 * TCP/IP stack detects it.
 			 */
-			if (pool_config->health_check_timeout > 0)
+			if (pool_config->health_check_params[node].health_check_timeout > 0)
 			{
 				CLEAR_ALARM;
 				pool_signal(SIGALRM, health_check_timer_handler);
-				alarm(pool_config->health_check_timeout);
+				alarm(pool_config->health_check_params[node].health_check_timeout);
 				errno = 0;
 				health_check_timer_expired = 0;
 			}
 
 			slot = make_persistent_db_connection_noerror(node, bkinfo->backend_hostname,
 														 bkinfo->backend_port,
-														 pool_config->health_check_database,
-														 pool_config->health_check_user,
-														 pool_config->health_check_password, false);
+														 pool_config->health_check_params[node].health_check_database,
+														 pool_config->health_check_params[node].health_check_user,
+														 pool_config->health_check_params[node].health_check_password, false);
 
-			if (pool_config->health_check_timeout > 0)
+			if (pool_config->health_check_params[node].health_check_timeout > 0)
 			{
 				/* cancel health check timer */
 				pool_signal(SIGALRM, SIG_IGN);
@@ -276,7 +276,7 @@ static bool establish_persistent_connection(int node)
 
 			if (slot)
 			{
-				if (retry_cnt != pool_config->health_check_max_retries)
+				if (retry_cnt != pool_config->health_check_params[node].health_check_max_retries)
 				{
 					ereport(LOG,
 							(errmsg("health check retrying on DB node: %d succeeded",
@@ -292,9 +292,9 @@ static bool establish_persistent_connection(int node)
 				ereport(LOG,
 						(errmsg("health check retrying on DB node: %d (round:%d)",
 								node,
-								pool_config->health_check_max_retries - retry_cnt)));
+								pool_config->health_check_params[node].health_check_max_retries - retry_cnt)));
 
-				sleep(pool_config->health_check_retry_delay);
+				sleep(pool_config->health_check_params[node].health_check_retry_delay);
 			}
 		} while (retry_cnt >= 0);
 	}
diff --git a/src/test/pgpool_setup b/src/test/pgpool_setup
index 6ce8063..d4e06e4 100755
--- a/src/test/pgpool_setup
+++ b/src/test/pgpool_setup
@@ -505,8 +505,18 @@ function set_pgpool_conf {
 		echo "recovery_2nd_stage_command = 'pgpool_recovery_pitr'" >> $CONF
 	fi
 
-	echo "health_check_period = 10" >> $CONF
-	echo "health_check_user = '$WHOAMI'" >> $CONF
+	n=0
+	while [ $n -lt $NUMCLUSTERS ]
+	do
+	    echo "health_check_period$n = 10" >> $CONF
+	    echo "health_check_user$n = '$WHOAMI'" >> $CONF
+	    echo "health_check_password$n = ''" >> $CONF
+	    echo "health_check_database$n = 'postgres'" >> $CONF
+	    echo "health_check_max_retries$n = '3'" >> $CONF
+	    echo "health_check_retry_delay$n = '1'" >> $CONF
+	    echo "connect_timeout$n = '1000'" >> $CONF
+	    n=`expr $n + 1`
+	done
 	OIDDIR=$BASEDIR/log/pgpool/oiddir
 	mkdir -p $OIDDIR
 	echo "memqcache_oiddir = '$OIDDIR'" >> $CONF
health_check.diff (4,628 bytes)

m.oyamata

2017-12-19 10:03

reporter   ~0001871

By applying the patch provided, the problem was solved.
Thank you very much!!

pgpool_3.7-2.log (74,457 bytes)

t-ishii

2017-12-19 10:08

developer   ~0001872

Great! We are going to release Pgpool-II 3.7.1 with the fix on January 9th, 2018.

Issue History

Date Modified Username Field Change
2017-12-18 09:41 m.oyamata New Issue
2017-12-18 09:41 m.oyamata File Added: pgpool_3.7.log
2017-12-18 09:41 m.oyamata File Added: pgpool.conf
2017-12-19 08:34 t-ishii Assigned To => t-ishii
2017-12-19 08:34 t-ishii Status new => feedback
2017-12-19 08:34 t-ishii Target Version => 3.7.1
2017-12-19 08:36 t-ishii File Added: health_check.diff
2017-12-19 08:36 t-ishii Note Added: 0001870
2017-12-19 10:03 m.oyamata File Added: pgpool_3.7-2.log
2017-12-19 10:03 m.oyamata Note Added: 0001871
2017-12-19 10:03 m.oyamata Status feedback => assigned
2017-12-19 10:08 t-ishii Note Added: 0001872
2017-12-19 10:08 t-ishii Status assigned => resolved