[pgpool-general: 5111] Re: pgpool incorrectly thinks backend cluster is down

Wed Nov 9 16:46:32 JST 2016

>>> Whenever Pgpool-II thinks a backend is being down, there should be a
>>>log entry in the Pgpool-II log file. Please check.
>> 
>> This is the error in the log file when this happens
>> 
>> 2016-11-02 00:00:07: pid 9217: DETAIL:  postmaster on DB node 0 was
>> shutdown by administrative command
>> 2016-11-02 00:00:07: pid 9217: LOG:  received degenerate backend request
>> for node_id: 0 from pid [9217]
>> 2016-11-02 00:00:07: pid 9188: LOG:  starting degeneration. shutdown host
>> prod1.amazonaws.com(5439)
>> 2016-11-02 00:00:07: pid 9188: LOG:  Restart all children
>> 
>> What does "postmaster on DB node 0 was shutdown by administrative command".
>> I havent sent any shutdown commands to pgpool.
> 
> Someone shutdown PostgreSQL (or used pg_cancel_backend).

Correction. I meant pg_terminate_backend, rather than
pg_cancel_backend (it just cancel current query if any and harmless
for Pgpool-II).

>> I verify connectivity to the
>> cluster whenever this happens and it is always fine. Why does the health
>> check that I configured to run every 30 secs not sense that the cluster is
>> back up again and update the pgpool_status file?
> 
> See the FAQ.
> http://www.pgpool.net/mediawiki/index.php/FAQ#Why_does_not_Pgpool-II_automatically_recognize_a_database_comes_back_online.3F
> 
>> Health check details from
>> the log are below
>> 
>> 2016-11-01 23:59:54: pid 9188: LOG:  notice_backend_error: called from
>> pgpool main. ignored.
>> 2016-11-01 23:59:54: pid 9188: WARNING:  child_exit: called from invalid
>> process. ignored.
> 
> No worry for this part. There was a race condition inside Pgpool-II
> but was resolved.
> 
>> 2016-11-01 23:59:54: pid 9188: ERROR:  unable to read data from DB node 0
>> 2016-11-01 23:59:54: pid 9188: DETAIL:  socket read failed with an error
>> "Success"
>> 
>> What dos the above log indicate?
> 
> DB node 0 disconnected the socket to Pgpool-II.
> 
>>>Yes, it randomly routes to backends. You can control the possibility
>>>of the routing.
>> 
>> Is it possible to control routing using round robin approach or least used
>> cluster? If so, where do I configure this?
> 
> No.
> 
>> Thanks,
>> - Manoj
>> 
>> On Mon, Nov 7, 2016 at 12:08 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> 
>>> > I have pgpool configured against two redshift backend clusters to do
>>> > parallel writes. Seemingly at random, pgpool determines that one or both
>>> > the clusters are down and stops accepting connections even when they are
>>> > not down. I have health check configured every 30 seconds but that does
>>> not
>>> > help as it checks heath and still determines they are down in
>>> pgpool_status
>>> > file. How is health status determined and written to the file
>>> > /var/log/pgpool/pgpool_status and why does pgpool think the clusters are
>>> > down when they are not?
>>>
>>> Whenever Pgpool-II thinks a backend is being down, there should be a
>>> log entry in the Pgpool-II log file. Please check.
>>>
>>> > I also tested read query routing and noticed they were being routed
>>> > randomly to the backend clusters. Is there a specific algorithm that
>>> pgpool
>>> > uses for read query routing?
>>>
>>> Yes, it randomly routes to backends. You can control the possibility
>>> of the routing.
>>>
>>> >
>>> >
>>> >
>>> >
>>> > My config parameters are below
>>> >
>>> >
>>> >
>>> > backend_hostname0 = 'cluster1'
>>> >
>>> > backend_port0 = 5439
>>> >
>>> > backend_weight0 = 1
>>> >
>>> > backend_data_directory0 = '/data1'
>>> >
>>> > backend_flag0 = 'ALLOW_TO_FAILOVER'
>>> >
>>> >
>>> >
>>> > backend_hostname1 = 'cluster2'
>>> >
>>> > backend_port1 = 5439
>>> >
>>> > backend_weight1 = 1
>>> >
>>> > backend_data_directory1 = '/data1'
>>> >
>>> > backend_flag1 = 'ALLOW_TO_FAILOVER'
>>> >
>>> >
>>> >
>>> > #-----------------------------------------------------------
>>> > -------------------
>>> >
>>> > # HEALTH CHECK
>>> >
>>> > #-----------------------------------------------------------
>>> > -------------------
>>> >
>>> >
>>> >
>>> > health_check_period = 30
>>> >
>>> >                                    # Health check period
>>> >
>>> >                                    # Disabled (0) by default
>>> >
>>> > health_check_timeout = 20
>>> >
>>> >                                    # Health check timeout
>>> >
>>> >                                    # 0 means no timeout
>>> >
>>> > health_check_user = 'username'
>>> >
>>> >                                    # Health check user
>>> >
>>> > health_check_password = 'password'
>>> >
>>> >                                    # Password for health check user
>>> >
>>> > health_check_max_retries = 10
>>> >
>>> >                                    # Maximum number of times to retry a
>>> > failed health check before giving up.
>>> >
>>> > health_check_retry_delay = 1
>>> >
>>> >                                    # Amount of time to wait (in seconds)
>>> > between retries.
>>>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general