[pgpool-hackers: 3349] Re: [proposal] New feature: auto failback of stancby node

Thu Jul 11 16:25:56 JST 2019

Hi all,

I improve auto_failback's patch and add document and regression test.

improvement point is:

* use health check process
The previous patch used sr check porcess only. If a network 
between promary and standby node is normal but a network between 
pgpool and standby node is trouble, auto_failback was executed  
after that failover probably executed.
In this patch, pgpool do health check to standby node before auto failback newly.

* add auto_failback_interaval paramter
This parameter can specify the minimum amount of time for execution 
interval of auto failback. This avoid repeating of failover and failback,
because of network error for example.

Comments and suggestions are welcome.

On Thu, 23 May 2019 08:15:18 +0900 (JST)
Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Great! This should solve one of the our long standing TODO item:
> https://pgpool.net/mediawiki/index.php/TODO#Automatically_reattach_a_node_in_streaming_master.2Fslave_configuration
> 
> With this feature enabled, Pgpool-II will automatically bring back a
> "healthy" standby node (that means the standby server is not only up
> and running but properly connected to the primary server).
> 
> One question is, whether it should check the replication delay of the
> standby server in question. I.e. if the delay is too large, do not
> automatically failback the server. I think the check is not necessary
> since we can avoid to use that by using the delay_threshold parameter.
> 
> Also note that if the server is in "catchup" replication state (that
> could happen if the server had been stopping for a while and the
> primary server had performed lots of modifications to the database),
> the server will not be automatically failbacked because the state is
> not "streaming".
> 
> BTW, the feature will work if PostgreSQL version is 9.1 or higher (not
> work with 9.0 because there's no pg_stat_replication view which the
> feature relies on).
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
> > Hi all,
> > 
> > I suggest new feature of auto failback for Pgpool-II 4.1.
> > 
> > Now, pgpool execute backend degeneration at tempolary network error or query's error response or etc.
> > So standby node of streaming replication is degenerated by pgpool, even if replication 
> > between primary and standby nodes is no problem. In this case, pgpool set 'down' status,
> > but postgres's replication is continuing normally.
> > But User need to attached to pgpool manually, in order to do load balance by pgpool again for standby node.
> > 
> > I attached a patch of 'auto failback'. This feature use "replication_state" added for pool_worker_process in 4.1.
> > And valid if auto_failback is on. If worker process find node which replication_status is 
> > 'streaming' and backend_status is 'down', worker_process request failback like pcp_attach_node.
> > 
> > Comments and suggestions are welcome.
> > 
> > Best regards,
> > -- 
> > Takuma Hoshiai <hoshiai at sraoss.co.jp>
> 

Best regards,

-- 
Takuma Hoshiai <hoshiai at sraoss.co.jp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: auto_failback_v2.patch
Type: application/octet-stream
Size: 20583 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190711/c340970b/attachment-0001.obj>