[pgpool-general: 6060] Re: pgpool-general Digest, Vol 78, Issue 19

Fri Apr 27 17:47:36 JST 2018

> Thanks for your support :)

You are welcome:-)

>> Still I don't understand. Pgpool-II and PostgreSQL master are on thesame
> machine, that means you could set like "backend_hostname0 = "127.0.0.1".
> 
> Because I need the public address for pgpool_recovery() method to permit
> online recovery from remote nodes. And pgPool like health_check
> process use backend_hostname0
> to do so.

Oh that makes sense.

> The setting health_check_hostname0 doesn't exist but trough, this is not a
> workaround.

So if we had health_check_hostname0, does it help you?

> So according to the log, is the timeout error triggered by this
> "health_check_timeout = 6" or this "connect_timeout = 10000" ?

I believe "health_check_timeout = 6". connect system call waits up to
10 seconds but before it expires health_check_timeout comes.

> I downed timeout to 2 seconds each and monitoring net paquets to find some
> details... Keep you in touch

Thanks.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> 2018-04-27 5:00 GMT+02:00 <pgpool-general-request at pgpool.net>:
> 
>> Send pgpool-general mailing list submissions to
>>         pgpool-general at pgpool.net
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         http://www.sraoss.jp/mailman/listinfo/pgpool-general
>> or, via email, send a message with subject or body 'help' to
>>         pgpool-general-request at pgpool.net
>>
>> You can reach the person managing the list at
>>         pgpool-general-owner at pgpool.net
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of pgpool-general digest..."
>>
>>
>> Today's Topics:
>>
>>    1. [pgpool-general: 6056] Re: "health check timer expired" on
>>       local machine (Tatsuo Ishii)
>>    2. [pgpool-general: 6057] Re: "health check timer expired" on
>>       local machine (Tatsuo Ishii)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 27 Apr 2018 09:40:09 +0900 (JST)
>> From: Tatsuo Ishii <ishii at sraoss.co.jp>
>> To: psyckow.prod at gmail.com
>> Cc: pgpool-general at pgpool.net
>> Subject: [pgpool-general: 6056] Re: "health check timer expired" on
>>         local machine
>> Message-ID: <20180427.094009.1280111065989297836.t-ishii at sraoss.co.jp>
>> Content-Type: Text/Plain; charset=us-ascii
>>
>> > 2018-04-26 20:38:10.225 CEST [23537] [unknown]@[unknown] LOG:  could not
>> > accept SSL connection: EOF detected
>> > 2018-04-26 20:59:34.856 CEST [27744] LOG:  trigger file found:
>> > /var/lib/postgresql/9.6/main/trigger
>> > 2018-04-26 20:59:34.856 CEST [27746] FATAL:  terminating walreceiver
>> > process due to administrator command
>> > 2018-04-26 20:59:34.857 CEST [27744] LOG:  invalid record length at
>> > 3/2133FD18: wanted 24, got 0
>> > 2018-04-26 20:59:34.857 CEST [27744] LOG:  redo done at 3/2133FCF0
>> > 2018-04-26 20:59:34.857 CEST [27744] LOG:  last completed transaction was
>> > at log time 2018-04-26 20:59:29.852716+02
>> > 2018-04-26 20:59:34.873 CEST [27744] LOG:  selected new timeline ID: 94
>> > 2018-04-26 20:59:34.994 CEST [27744] LOG:  archive recovery complete
>> > 2018-04-26 20:59:35.025 CEST [27744] LOG:  MultiXact member wraparound
>> > protections are now enabled
>> > 2018-04-26 20:59:35.034 CEST [25506] LOG:  autovacuum launcher started
>> > 2018-04-26 20:59:35.034 CEST [27743] LOG:  database system is ready to
>> > accept connections
>> >
>> >> 2018-04-26 20:59:34.856 CEST [27744] LOG:  trigger file found:
>> > /var/lib/postgresql/9.6/main/trigger
>> > -> On this line I assume this is the standby who is talking, because
>> there
>> > is no /var/lib/postgresql/9.6/main directory on the master, data are
>> mount
>> > somewhere else. The failover process start at  20:59:29 on pgpool, and
>> the
>> > standby get promoted.
>>
>> Yes, that's my understanding too. So there's no emmitted log on the
>> master around 2018-04-26 20:59:34.856 CEST, I assume.
>>
>> >> 2018-04-26 20:38:10.225 CEST [23537] [unknown]@[unknown] LOG:  could not
>> > accept SSL connection: EOF detected
>> > This could be the weird boy. But it happened 20 minutes before the bug
>> and
>> > this have not much to do with the healtcheck process.
>>
>> No idea for this part.
>>
>> > No more revelant things on Postgres logs
>>
>> Ok.
>>
>> >> there's no reason for the heath check process to not accept 127.0.0.1.
>> >
>> > Like I said, the health process fetch PostgreSQL trough public ip. So it
>> > get trough a different interface.
>>
>> Still I don't understand. Pgpool-II and PostgreSQL master are on the
>> same machine, that means you could set like "backend_hostname0 =
>> "127.0.0.1". But actually you did not prefer the way. The heath check
>> process just uses the same hostname/ip using backend_hostname0.
>>
>> > At this time PostgreSQL was receiving ~5 inserts / second and that's all.
>> > No error detected on the apps.
>>
>> Yeah, no big load.
>>
>> > So the only reason I could find is a problem on the public interface of
>> > this server, but this is really really unsual as it come from a dedicated
>> > server provider.
>>
>> >From the error message of heath check process:
>> > 2018-04-26 20:59:29: pid 2153:LOG:  failed to connect to PostgreSQL
>> server
>> > on "x.x.x.x:xxx" using INET socket
>> > 2018-04-26 20:59:29: pid 2153:DETAIL:  health check timer expired
>> > 2018-04-26 20:59:29: pid 2153:ERROR:  failed to make persistent db
>>
>> Pgpool-II health check process uses non-blocking socket for connecting
>> to PostgreSQL. After issuing connect system call it waits for its
>> completion using select system call with timeout: connect_timeout in
>> pgpool.conf (in your case 10 seconds). On the other hand health_check
>> timeout is 6 seconds. So after 6 seconds, an alarm interrupted the
>> select system call and it returned with errno == EINTR, then the log
>> emitted. Not sure why the connect system call did not respond for 6
>> seconds.
>>
>> That's all what I know from the log.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Fri, 27 Apr 2018 10:15:54 +0900 (JST)
>> From: Tatsuo Ishii <ishii at sraoss.co.jp>
>> To: psyckow.prod at gmail.com
>> Cc: pgpool-general at pgpool.net
>> Subject: [pgpool-general: 6057] Re: "health check timer expired" on
>>         local machine
>> Message-ID: <20180427.101554.1927607666615743161.t-ishii at sraoss.co.jp>
>> Content-Type: Text/Plain; charset=us-ascii
>>
>> > Pgpool-II health check process uses non-blocking socket for connecting
>> > to PostgreSQL. After issuing connect system call it waits for its
>> > completion using select system call with timeout: connect_timeout in
>> > pgpool.conf (in your case 10 seconds). On the other hand health_check
>> > timeout is 6 seconds. So after 6 seconds, an alarm interrupted the
>> > select system call and it returned with errno == EINTR, then the log
>> > emitted. Not sure why the connect system call did not respond for 6
>> > seconds.
>> >
>> > That's all what I know from the log.
>>
>> If you want to make research on this, packet dump is required.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>>
>> End of pgpool-general Digest, Vol 78, Issue 19
>> **********************************************
>>