[pgpool-hackers: 879] Re: Making Failover more robust.

Sun Apr 19 22:09:12 JST 2015

> Currenlty pgpool-II does not discriminate between types and nature of
> backend failures, especially when performing the backend health check, And
> it triggers the node failover as soon as the health check fails to connect
> to backend PostgreSQL server (of course after retries gets expired). This
> is a big problem in case of transient failures like for example if
> max_connection is reached on the backend node and health check connection
> gets denied, it will still be considered as a backend node failure by
> pgpool-II and it will go on to trigger a failover. Despite the fact that
> node actually is working fine and pgpool-II child processes are
> successfully connected to that.
> 
> So I think pgpool-II health check should consider the cause and type of
> error happened on backend and depending on the type of error It should
> either register the failover request, ignore the error or may be just
> change the backend node status. We could introduce a new node status to
> identify these type of situations, (e-g NODE_TEMP_DOWN) and have a new
> configuration parameter to control the behavior of this state. And instead
> of straight away initiating the failover on a node, Health check keeps on
> probing for the node with this new NODE_TEMP_DOWN status and automatically
> make the node available when health check succeeds on the node.
> 
> Thoughts, suggestions and design ideas are most welcome

Sounds like an excellent idea!

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp