[pgpool-general: 5108] Re: pgpool incorrectly thinks backend cluster is down

Tue Nov 8 09:12:43 JST 2016

>> Whenever Pgpool-II thinks a backend is being down, there should be a
>>log entry in the Pgpool-II log file. Please check.
> 
> This is the error in the log file when this happens
> 
> 2016-11-02 00:00:07: pid 9217: DETAIL:  postmaster on DB node 0 was
> shutdown by administrative command
> 2016-11-02 00:00:07: pid 9217: LOG:  received degenerate backend request
> for node_id: 0 from pid [9217]
> 2016-11-02 00:00:07: pid 9188: LOG:  starting degeneration. shutdown host
> prod1.amazonaws.com(5439)
> 2016-11-02 00:00:07: pid 9188: LOG:  Restart all children
> 
> What does "postmaster on DB node 0 was shutdown by administrative command".
> I havent sent any shutdown commands to pgpool.

Someone shutdown PostgreSQL (or used pg_cancel_backend).

> I verify connectivity to the
> cluster whenever this happens and it is always fine. Why does the health
> check that I configured to run every 30 secs not sense that the cluster is
> back up again and update the pgpool_status file?

See the FAQ.
http://www.pgpool.net/mediawiki/index.php/FAQ#Why_does_not_Pgpool-II_automatically_recognize_a_database_comes_back_online.3F

> Health check details from
> the log are below
> 
> 2016-11-01 23:59:54: pid 9188: LOG:  notice_backend_error: called from
> pgpool main. ignored.
> 2016-11-01 23:59:54: pid 9188: WARNING:  child_exit: called from invalid
> process. ignored.

No worry for this part. There was a race condition inside Pgpool-II
but was resolved.

> 2016-11-01 23:59:54: pid 9188: ERROR:  unable to read data from DB node 0
> 2016-11-01 23:59:54: pid 9188: DETAIL:  socket read failed with an error
> "Success"
> 
> What dos the above log indicate?

DB node 0 disconnected the socket to Pgpool-II.

>>Yes, it randomly routes to backends. You can control the possibility
>>of the routing.
> 
> Is it possible to control routing using round robin approach or least used
> cluster? If so, where do I configure this?

No.

> Thanks,
> - Manoj
> 
> On Mon, Nov 7, 2016 at 12:08 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> > I have pgpool configured against two redshift backend clusters to do
>> > parallel writes. Seemingly at random, pgpool determines that one or both
>> > the clusters are down and stops accepting connections even when they are
>> > not down. I have health check configured every 30 seconds but that does
>> not
>> > help as it checks heath and still determines they are down in
>> pgpool_status
>> > file. How is health status determined and written to the file
>> > /var/log/pgpool/pgpool_status and why does pgpool think the clusters are
>> > down when they are not?
>>
>> Whenever Pgpool-II thinks a backend is being down, there should be a
>> log entry in the Pgpool-II log file. Please check.
>>
>> > I also tested read query routing and noticed they were being routed
>> > randomly to the backend clusters. Is there a specific algorithm that
>> pgpool
>> > uses for read query routing?
>>
>> Yes, it randomly routes to backends. You can control the possibility
>> of the routing.
>>
>> >
>> >
>> >
>> >
>> > My config parameters are below
>> >
>> >
>> >
>> > backend_hostname0 = 'cluster1'
>> >
>> > backend_port0 = 5439
>> >
>> > backend_weight0 = 1
>> >
>> > backend_data_directory0 = '/data1'
>> >
>> > backend_flag0 = 'ALLOW_TO_FAILOVER'
>> >
>> >
>> >
>> > backend_hostname1 = 'cluster2'
>> >
>> > backend_port1 = 5439
>> >
>> > backend_weight1 = 1
>> >
>> > backend_data_directory1 = '/data1'
>> >
>> > backend_flag1 = 'ALLOW_TO_FAILOVER'
>> >
>> >
>> >
>> > #-----------------------------------------------------------
>> > -------------------
>> >
>> > # HEALTH CHECK
>> >
>> > #-----------------------------------------------------------
>> > -------------------
>> >
>> >
>> >
>> > health_check_period = 30
>> >
>> >                                    # Health check period
>> >
>> >                                    # Disabled (0) by default
>> >
>> > health_check_timeout = 20
>> >
>> >                                    # Health check timeout
>> >
>> >                                    # 0 means no timeout
>> >
>> > health_check_user = 'username'
>> >
>> >                                    # Health check user
>> >
>> > health_check_password = 'password'
>> >
>> >                                    # Password for health check user
>> >
>> > health_check_max_retries = 10
>> >
>> >                                    # Maximum number of times to retry a
>> > failed health check before giving up.
>> >
>> > health_check_retry_delay = 1
>> >
>> >                                    # Amount of time to wait (in seconds)
>> > between retries.
>>