[pgpool-general: 1476] Re: health check timeout in pgpool-II-3.2.3

Tatsuo Ishii ishii at postgresql.org
Sun Mar 10 08:49:15 JST 2013


Ok, I think I found the cause of problem(hopefully).

When pgpool tries to connect to backend in health checking, it will
hang because the network cable is pulled out. In this case select(2),
which is waiting for completion of connect(2), is interrupted by a
signal which is generated by health_check_timeout. In this case pgpool
should have treated this as an error, but actually it did was
retrying, which will take longer before fail over occurs.

Can you please try included patch? Please note that it is for original
pgpool-II 3.2.3 (or git master), not accumlating patch for my previous
one.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> I was requesting a system call trace, not pgpool log.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> 
>> Dear Sir,
>> 
>> My log file is attached.
>> 
>> It looks like that although pgpool cannot connect to database, it still
>> determine the database as active (status=1).
>> 
>> thank you for your help.
>> 
>> regards
>> 
>> 
>> 
>> 
>> 
>> 
>> 2013/3/7 Tatsuo Ishii <ishii at postgresql.org>
>> 
>>> Thank you for testing.
>>> Can you please send me system call trace log of pgpool main (parent)
>>> process?
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>> > Dear Sir,
>>> >
>>> > Thanks for your kindly response.
>>> >
>>> > After patching your pool_connection_pool.patch file, pgpool still cannot
>>> > detect the unplugged network cable of database within the timeout set in
>>> > health_check.
>>> >
>>> > The settings of my pgpool.conf are listed below:
>>> >
>>> > health_check_period=2
>>> > health_check_timeout=1
>>> > health_check_max_retries=3
>>> > health_check_retry_delay=0
>>> >
>>> > Theoretically, pgpool can detect an unplugged network cable no more than
>>> 6
>>> > seconds by health check.
>>> > But it takes about more than 30 seconds in pgpool-II-3.2.3.
>>> >
>>> > But in pgpool-II-3.2.1, pgpool can detect this situation in 6 seconds.
>>> >
>>> > Thanks for helping deal with my problem.
>>> >
>>> > best regards,
>>> >
>>> >
>>> >
>>> > manphis chen
>>> >
>>> >
>>> >
>>> >
>>> > 2013/3/6 Tatsuo Ishii <ishii at postgresql.org>
>>> >
>>> >> > Dear all,
>>> >> >
>>> >> > I am now using pgpool-II-3.2.3 for my experiment.
>>> >> >
>>> >> > I found that the parameter health_check_timeout does not work.
>>> >> >
>>> >> > In my experiment, I unplug network cable.
>>> >> >
>>> >> > pgpool found that the status of the unplugged database becomes 3 for a
>>> >> long
>>> >> > time, not within the health_check_timeout.
>>> >> >
>>> >> > I also found that this situation does not happen in pgpool-II-3.2.1.
>>> >> >
>>> >> > May I ask is there any changes in the pgpool-II-3.2.3, such that the
>>> >> > parameter health_check_timeout does not work well?
>>> >>
>>> >> I found possible cause of the problem. Can you please try out attached
>>> >> patch?
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS, Inc. Japan
>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> Japanese: http://www.sraoss.co.jp
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Wireless Communication and Mobile Computing Research Group Laboratory
>>> > TEL:(886)-2-23625336-448
>>> > Dept. of Computer Science and Information Engineering
>>> > National Taiwan University, Taipei, Taiwan
>>>
>> 
>> 
>> 
>> -- 
>> Wireless Communication and Mobile Computing Research Group Laboratory
>> TEL:(886)-2-23625336-448
>> Dept. of Computer Science and Information Engineering
>> National Taiwan University, Taipei, Taiwan
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
-------------- next part --------------
diff --git a/pool_connection_pool.c b/pool_connection_pool.c
index 4a1048b..bb25658 100644
--- a/pool_connection_pool.c
+++ b/pool_connection_pool.c
@@ -611,9 +611,15 @@ int connect_inet_domain_socket_by_port(char *host, int port, bool retry)
 				/* select timeout */
 				if (retry)
 				{
-					pool_log("connect_inet_domain_socket: select() timedout. retrying...");
+					pool_log("connect_inet_domain_socket: select() timed out. retrying...");
 					continue;
 				}
+				else
+				{
+					pool_error("connect_inet_domain_socket: select() timed out");
+					close(fd);
+					return -1;
+				}
 			}
 			else if (sts > 0)
 			{
@@ -659,6 +665,9 @@ int connect_inet_domain_socket_by_port(char *host, int port, bool retry)
 					pool_log("connect_inet_domain_socket: select() interrupted. retrying...");
 					continue;
 				}
+				pool_log("connect_inet_domain_socket: select() interrupted");
+				close(fd);
+				return -1;
 			}
 		}
 		break;


More information about the pgpool-general mailing list