[pgpool-general: 6086] Re: primary suddenly is down only in pgpool

Wed May 16 10:38:12 JST 2018

Your attachment was too large for the mailing list and was not
distributed. So I replied back to the email Cc:ed to me.

> 11 18:44:03 - [No Connection] [18906]LOG:  failed to connect to PostgreSQL
> server on "ptkpl-psgsqldb2:5432", getsockopt() detected error "No route to
> host"

This says all. Pgpool-II cannot generate the error message. It's
definitely came from underlying OS or network layer. Some googlings
using keyword "no route to host" suggest that it might be caused by
wrong iptables settings.

https://www.maketecheasier.com/fix-no-route-to-host-error-linux/

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

From: Mariel Cherkassky <mariel.cherkassky at gmail.com>
Subject: Re: [pgpool-general: 6070] primary suddenly is down only in pgpool
Date: Sun, 13 May 2018 11:49:11 +0300
Message-ID: <CA+t6e1nv+nU_VtgD88+O3b-uJBYC4+WAPs5_7jt48Q4YmnRWUg at mail.gmail.com>

> Hi Tatsuo.
> It suddenly happened again during the weekend. This time I got errors in my
> log :
> -11 18:43:33 - [No Connection] [20902]LOG:  trying connecting to PostgreSQL
> server on "ptkpl-psgsqldb2:5432" by INET socket
> [[No Connection]]([No Connection]) - 2018-05-11 18:43:33 - [No Connection]
> [20902]DETAIL:  timed out. retrying...
> 11 18:44:03 - [No Connection] [18906]LOG:  failed to connect to PostgreSQL
> server on "ptkpl-psgsqldb2:5432", getsockopt() detected error "No route to
> host"
> [[No Connection]]([No Connection]) - 2018-05-11 18:44:03 - [No Connection]
> [18906]LOG:  received degenerate backend request for node_id: 1 from pid
> [18906]
> 
> and the pool keeped looking for the primary "find_primary_node: checking
> backend no 0/1/2" for  6 minutes. During all this time the primary was up
> and was working fine. What do you recommend to do ? Only after attaching
> the primary again everything worked. Why the pool didnt recognizer the
> primary ? I'm checking with my networking team If there was a network
> problem but I dont think that it is related.
> 
> 
> Thanks , MARIEL.
> 
> 2018-05-06 17:22 GMT+03:00 Tatsuo Ishii <ishii at sraoss.co.jp>:
> 
>> Both "show pool_nodes" and pcp_node_info after all checks the status
>> on the shared memory area. However the implementation is completely
>> different; "show pool_nodes" is simpler and it's just a wrapper for
>> showing the status as SQL. pcp_node_info is a client/server
>> program. The status is retrieved by pcp server then is sent to pcp
>> client (pcp_node_info) via pcp protocol.
>>
>> Also next time you'd better check the status file to very whether
>> pcp_node_info tells the truth.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > No, I didnt check the status via "show pool_nodes". To be honest it
>> isnt
>> > the first time it happens. Does there a difference between
>> show_pool_nodes
>> > and pcp_node info on the deeper level ? I mean I know that
>> show_pool_nodes
>> > queries a view or a table, what about pcp_node_info ? I dont think that
>> it
>> > is related to repmgr..
>> >
>> > 2018-05-06 16:49 GMT+03:00 Tatsuo Ishii <ishii at sraoss.co.jp>:
>> >
>> >> > Hi,
>> >> > I have 3 postgres servers (one primary + 2 standbys) that have
>> >> replciation
>> >> > configured with repmgr:
>> >> > pg1 - standby
>> >> > pg2 - primary
>> >> > pg3 - standby
>> >> >
>> >> > I have also 2 pgpool servers(v 3.7.2 and on each one there is one pool
>> >> > instance. There isnt any watchdog, instead I have a vip address that
>> >> > directs the requests to the available pgpool instance. I configured my
>> >> own
>> >> > metrics that check the status of the database nodes via the pcp
>> >> interface.
>> >> >
>> >> > Today at 11:25 suddenly I got an alert that both my pgpools saw that
>> the
>> >> > primary node is down (via pcp). I connected and checked and indeed the
>> >> > primary was down :
>> >> > [postgres at pool2 log]$ pcp_node_info -h localhost -U postgres -p 9898
>> 1
>> >> -w
>> >> > pg2 5432 2 0.333333 down standby
>> >> >
>> >> > I checked it in both pools and the same result. I immediatly attached
>> >> them
>> >> > and it worked. I wanted to understand why it happened but I dont see
>> any
>> >> > error in the logs. I attach the logs of both my pools. Can you help me
>> >> > identify the problem ?
>> >>
>> >> No idea. I have never seen PostgreSQL is detached without any trace in
>> >> pgpool log. Have you seen the node status using "show pool_nodes"? If
>> >> not, I suspect there's a bug with pcp_node_info. If you tried "show
>> >> pool_nodes" and saw the same status as pcp_node_info, then I
>> >> completely lose idea.
>> >>
>> >> There may be a interaction with repmgr, but I am not familiar with
>> >> repmgr and this is just a wild guess.
>> >>
>> >> Best regards,
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>>