[pgpool-general: 6055] Re: "health check timer expired" on local machine

Fri Apr 27 08:23:49 JST 2018

Hi and thanks

Here is Postgres log at the time :

2018-04-26 20:38:10.225 CEST [23537] [unknown]@[unknown] LOG:  could not
accept SSL connection: EOF detected
2018-04-26 20:59:34.856 CEST [27744] LOG:  trigger file found:
/var/lib/postgresql/9.6/main/trigger
2018-04-26 20:59:34.856 CEST [27746] FATAL:  terminating walreceiver
process due to administrator command
2018-04-26 20:59:34.857 CEST [27744] LOG:  invalid record length at
3/2133FD18: wanted 24, got 0
2018-04-26 20:59:34.857 CEST [27744] LOG:  redo done at 3/2133FCF0
2018-04-26 20:59:34.857 CEST [27744] LOG:  last completed transaction was
at log time 2018-04-26 20:59:29.852716+02
2018-04-26 20:59:34.873 CEST [27744] LOG:  selected new timeline ID: 94
2018-04-26 20:59:34.994 CEST [27744] LOG:  archive recovery complete
2018-04-26 20:59:35.025 CEST [27744] LOG:  MultiXact member wraparound
protections are now enabled
2018-04-26 20:59:35.034 CEST [25506] LOG:  autovacuum launcher started
2018-04-26 20:59:35.034 CEST [27743] LOG:  database system is ready to
accept connections

> 2018-04-26 20:59:34.856 CEST [27744] LOG:  trigger file found:
/var/lib/postgresql/9.6/main/trigger
-> On this line I assume this is the standby who is talking, because there
is no /var/lib/postgresql/9.6/main directory on the master, data are mount
somewhere else. The failover process start at  20:59:29 on pgpool, and the
standby get promoted.

> 2018-04-26 20:38:10.225 CEST [23537] [unknown]@[unknown] LOG:  could not
accept SSL connection: EOF detected
This could be the weird boy. But it happened 20 minutes before the bug and
this have not much to do with the healtcheck process.

No more revelant things on Postgres logs

> there's no reason for the heath check process to not accept 127.0.0.1.

Like I said, the health process fetch PostgreSQL trough public ip. So it
get trough a different interface.

At this time PostgreSQL was receiving ~5 inserts / second and that's all.
No error detected on the apps.

So the only reason I could find is a problem on the public interface of
this server, but this is really really unsual as it come from a dedicated
server provider.

2018-04-27 0:55 GMT+02:00 Tatsuo Ishii <ishii at sraoss.co.jp>:

> > Hi and thanks for your work.
> >
> > I use pgpool2 3.7.2 (latest git) with 2 backend as master-slave mode with
> > native stream replication.
> >
> > I think I have an issue concerning the health check process.
> >
> > Since two days now I had two "health check timer expired" that appears
> > yersterday around 9 am and today around 8 pm.
> >
> > The weird thing is... Pgpool and the backend in question are on the same
> > machine. This backend is the master. Here is the log :
>
> > Despite the fact that these are on the same machine, I use public IP for
> > the backend0 and not 127.0.0.1, because of failover process that required
> > this ip.
>
> Can you elaborate more? As far as I know, there's no reason for the
> heath check process to not accept 127.0.0.1.
>
> > Do you think this could be a problem from network conditions on the
> server
> > itself or an actual issue ?
>
> Yes. Was PostgreSQL busy at that time?
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180427/15cdecb6/attachment-0001.html>