[pgpool-general: 1480] Re: health check timeout in pgpool-II-3.2.3

Tatsuo Ishii ishii at postgresql.org
Mon Mar 11 11:25:49 JST 2013


Glad to hear that! Thanks for the report!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Dear Sir,
> 
> It works as our expect after patching the code.
> 
> Thank you very much for the help.
> 
> 
> 
> 
> Best Regards,
> 
> manphis Chen
> 
> 
> 2013/3/10 Tatsuo Ishii <ishii at postgresql.org>
> 
>> Ok, I think I found the cause of problem(hopefully).
>>
>> When pgpool tries to connect to backend in health checking, it will
>> hang because the network cable is pulled out. In this case select(2),
>> which is waiting for completion of connect(2), is interrupted by a
>> signal which is generated by health_check_timeout. In this case pgpool
>> should have treated this as an error, but actually it did was
>> retrying, which will take longer before fail over occurs.
>>
>> Can you please try included patch? Please note that it is for original
>> pgpool-II 3.2.3 (or git master), not accumlating patch for my previous
>> one.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > I was requesting a system call trace, not pgpool log.
>> > --
>> > Tatsuo Ishii
>> > SRA OSS, Inc. Japan
>> > English: http://www.sraoss.co.jp/index_en.php
>> > Japanese: http://www.sraoss.co.jp
>> >
>> >> Dear Sir,
>> >>
>> >> My log file is attached.
>> >>
>> >> It looks like that although pgpool cannot connect to database, it still
>> >> determine the database as active (status=1).
>> >>
>> >> thank you for your help.
>> >>
>> >> regards
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> 2013/3/7 Tatsuo Ishii <ishii at postgresql.org>
>> >>
>> >>> Thank you for testing.
>> >>> Can you please send me system call trace log of pgpool main (parent)
>> >>> process?
>> >>> --
>> >>> Tatsuo Ishii
>> >>> SRA OSS, Inc. Japan
>> >>> English: http://www.sraoss.co.jp/index_en.php
>> >>> Japanese: http://www.sraoss.co.jp
>> >>>
>> >>> > Dear Sir,
>> >>> >
>> >>> > Thanks for your kindly response.
>> >>> >
>> >>> > After patching your pool_connection_pool.patch file, pgpool still
>> cannot
>> >>> > detect the unplugged network cable of database within the timeout
>> set in
>> >>> > health_check.
>> >>> >
>> >>> > The settings of my pgpool.conf are listed below:
>> >>> >
>> >>> > health_check_period=2
>> >>> > health_check_timeout=1
>> >>> > health_check_max_retries=3
>> >>> > health_check_retry_delay=0
>> >>> >
>> >>> > Theoretically, pgpool can detect an unplugged network cable no more
>> than
>> >>> 6
>> >>> > seconds by health check.
>> >>> > But it takes about more than 30 seconds in pgpool-II-3.2.3.
>> >>> >
>> >>> > But in pgpool-II-3.2.1, pgpool can detect this situation in 6
>> seconds.
>> >>> >
>> >>> > Thanks for helping deal with my problem.
>> >>> >
>> >>> > best regards,
>> >>> >
>> >>> >
>> >>> >
>> >>> > manphis chen
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > 2013/3/6 Tatsuo Ishii <ishii at postgresql.org>
>> >>> >
>> >>> >> > Dear all,
>> >>> >> >
>> >>> >> > I am now using pgpool-II-3.2.3 for my experiment.
>> >>> >> >
>> >>> >> > I found that the parameter health_check_timeout does not work.
>> >>> >> >
>> >>> >> > In my experiment, I unplug network cable.
>> >>> >> >
>> >>> >> > pgpool found that the status of the unplugged database becomes 3
>> for a
>> >>> >> long
>> >>> >> > time, not within the health_check_timeout.
>> >>> >> >
>> >>> >> > I also found that this situation does not happen in
>> pgpool-II-3.2.1.
>> >>> >> >
>> >>> >> > May I ask is there any changes in the pgpool-II-3.2.3, such that
>> the
>> >>> >> > parameter health_check_timeout does not work well?
>> >>> >>
>> >>> >> I found possible cause of the problem. Can you please try out
>> attached
>> >>> >> patch?
>> >>> >> --
>> >>> >> Tatsuo Ishii
>> >>> >> SRA OSS, Inc. Japan
>> >>> >> English: http://www.sraoss.co.jp/index_en.php
>> >>> >> Japanese: http://www.sraoss.co.jp
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Wireless Communication and Mobile Computing Research Group Laboratory
>> >>> > TEL:(886)-2-23625336-448
>> >>> > Dept. of Computer Science and Information Engineering
>> >>> > National Taiwan University, Taipei, Taiwan
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Wireless Communication and Mobile Computing Research Group Laboratory
>> >> TEL:(886)-2-23625336-448
>> >> Dept. of Computer Science and Information Engineering
>> >> National Taiwan University, Taipei, Taiwan
>> > _______________________________________________
>> > pgpool-general mailing list
>> > pgpool-general at pgpool.net
>> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>> diff --git a/pool_connection_pool.c b/pool_connection_pool.c
>> index 4a1048b..bb25658 100644
>> --- a/pool_connection_pool.c
>> +++ b/pool_connection_pool.c
>> @@ -611,9 +611,15 @@ int connect_inet_domain_socket_by_port(char *host,
>> int port, bool retry)
>>                                 /* select timeout */
>>                                 if (retry)
>>                                 {
>> -
>> pool_log("connect_inet_domain_socket: select() timedout. retrying...");
>> +
>> pool_log("connect_inet_domain_socket: select() timed out. retrying...");
>>                                         continue;
>>                                 }
>> +                               else
>> +                               {
>> +
>> pool_error("connect_inet_domain_socket: select() timed out");
>> +                                       close(fd);
>> +                                       return -1;
>> +                               }
>>                         }
>>                         else if (sts > 0)
>>                         {
>> @@ -659,6 +665,9 @@ int connect_inet_domain_socket_by_port(char *host, int
>> port, bool retry)
>>
>> pool_log("connect_inet_domain_socket: select() interrupted. retrying...");
>>                                         continue;
>>                                 }
>> +                               pool_log("connect_inet_domain_socket:
>> select() interrupted");
>> +                               close(fd);
>> +                               return -1;
>>                         }
>>                 }
>>                 break;
>>
>>
> 
> 
> -- 
> Wireless Communication and Mobile Computing Research Group Laboratory
> TEL:(886)-2-23625336-448
> Dept. of Computer Science and Information Engineering
> National Taiwan University, Taipei, Taiwan


More information about the pgpool-general mailing list