[pgpool-general: 6064] Re: "health check timer expired" on local machine

Sat Apr 28 09:10:09 JST 2018

I noticed you set health_check_max_retries = 0. If the error were a
transient one, set some positive number to health_check_max_retries
might help.

Also I am interested in a strace log when the failover occurs.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Oh I forgot the configuration, here it is :
> 
> health_check_period = 2
> health_check_timeout = 6
> health_check_max_retries = 0
> health_check_retry_delay = 1
> connect_timeout = 10000
> 
> No individual healtcheck settings.
> 
> So of course I could increase connect_timeout, but 10 seconds is already a
> lot to trigger failover process for a production server receiving ~10
> insert / second.
> 
> 2018-04-26 21:23 GMT+02:00 Bud Curly <psyckow.prod at gmail.com>:
> 
>> Hi and thanks for your work.
>>
>> I use pgpool2 3.7.2 (latest git) with 2 backend as master-slave mode with
>> native stream replication.
>>
>> I think I have an issue concerning the health check process.
>>
>> Since two days now I had two "health check timer expired" that appears
>> yersterday around 9 am and today around 8 pm.
>>
>> The weird thing is... Pgpool and the backend in question are on the same
>> machine. This backend is the master. Here is the log :
>>
>> 2018-04-26 20:59:29: pid 2153:LOG:  failed to connect to PostgreSQL server
>> on "x.x.x.x:xxx" using INET socket
>> 2018-04-26 20:59:29: pid 2153:DETAIL:  health check timer expired
>> 2018-04-26 20:59:29: pid 2153:ERROR:  failed to make persistent db
>> connection
>> 2018-04-26 20:59:29: pid 2153:DETAIL:  connection to host:" x.x.x.x:xxx"
>> failed
>> 2018-04-26 20:59:29: pid 2153:LOG:  health check failed on node 0
>> (timeout:1)
>> 2018-04-26 20:59:29: pid 2153:LOG:  received degenerate backend request
>> for node_id: 0 from pid [2153]
>> 2018-04-26 20:59:29: pid 2104:LOG:  Pgpool-II parent process has received
>> failover request
>> 2018-04-26 20:59:29: pid 2104:LOG:  starting degeneration. shutdown host
>> x.x.x.x:xxx
>> 2018-04-26 20:59:29: pid 2104:LOG:  Restart all children
>>
>> Despite the fact that these are on the same machine, I use public IP for
>> the backend0 and not 127.0.0.1, because of failover process that required
>> this ip.
>>
>> Do you think this could be a problem from network conditions on the server
>> itself or an actual issue ?
>>
>> Thanks
>>