[pgpool-general: 6066] Re: "health check timer expired" on local machine

Mon Apr 30 22:23:55 JST 2018

Hi

So on the 28 I set  health_check_max_retries = 1 and had no problem during
1 whole day. So I set back health_check_max_retries = 0 yersterday (the
29th) to make sure of the problem and the problem didn't showed up since
then...

So I want to think that was some network connection reset from my server
datacenter... This problem appeared one week after running with
health_check_max_retries
= 0 and no problem.

I'm sorry I don't have the log anymore. I will wait until tomorrow
with health_check_max_retries
= 0 but then will set health_check_max_retries = 1 to start pre-prod test.

Thanks for your help, have a nice day !

2018-04-28 2:10 GMT+02:00 Tatsuo Ishii <ishii at sraoss.co.jp>:

> I noticed you set health_check_max_retries = 0. If the error were a
> transient one, set some positive number to health_check_max_retries
> might help.
>
> Also I am interested in a strace log when the failover occurs.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Oh I forgot the configuration, here it is :
> >
> > health_check_period = 2
> > health_check_timeout = 6
> > health_check_max_retries = 0
> > health_check_retry_delay = 1
> > connect_timeout = 10000
> >
> > No individual healtcheck settings.
> >
> > So of course I could increase connect_timeout, but 10 seconds is already
> a
> > lot to trigger failover process for a production server receiving ~10
> > insert / second.
> >
> > 2018-04-26 21:23 GMT+02:00 Bud Curly <psyckow.prod at gmail.com>:
> >
> >> Hi and thanks for your work.
> >>
> >> I use pgpool2 3.7.2 (latest git) with 2 backend as master-slave mode
> with
> >> native stream replication.
> >>
> >> I think I have an issue concerning the health check process.
> >>
> >> Since two days now I had two "health check timer expired" that appears
> >> yersterday around 9 am and today around 8 pm.
> >>
> >> The weird thing is... Pgpool and the backend in question are on the same
> >> machine. This backend is the master. Here is the log :
> >>
> >> 2018-04-26 20:59:29: pid 2153:LOG:  failed to connect to PostgreSQL
> server
> >> on "x.x.x.x:xxx" using INET socket
> >> 2018-04-26 20:59:29: pid 2153:DETAIL:  health check timer expired
> >> 2018-04-26 20:59:29: pid 2153:ERROR:  failed to make persistent db
> >> connection
> >> 2018-04-26 20:59:29: pid 2153:DETAIL:  connection to host:" x.x.x.x:xxx"
> >> failed
> >> 2018-04-26 20:59:29: pid 2153:LOG:  health check failed on node 0
> >> (timeout:1)
> >> 2018-04-26 20:59:29: pid 2153:LOG:  received degenerate backend request
> >> for node_id: 0 from pid [2153]
> >> 2018-04-26 20:59:29: pid 2104:LOG:  Pgpool-II parent process has
> received
> >> failover request
> >> 2018-04-26 20:59:29: pid 2104:LOG:  starting degeneration. shutdown host
> >> x.x.x.x:xxx
> >> 2018-04-26 20:59:29: pid 2104:LOG:  Restart all children
> >>
> >> Despite the fact that these are on the same machine, I use public IP for
> >> the backend0 and not 127.0.0.1, because of failover process that
> required
> >> this ip.
> >>
> >> Do you think this could be a problem from network conditions on the
> server
> >> itself or an actual issue ?
> >>
> >> Thanks
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180430/412caa29/attachment.html>