[pgpool-hackers: 1985] Re: Proposal to make backend node failover mechanism quorum aware

Mon Jan 16 16:10:07 JST 2017

Hi Usama,

If my understanding is correct, by using the quorum, Pgpool-B and
Pgpool-C decides that B1 is healthy. What happens when Pgpool-A tries
to connect to B1 if the network failure between Pgpool-A and B1
continues? I guess clients connect to Pgpool-A get error and failed to
connect to database?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi Hackers,
> 
> This is the proposal to make the failover of backend PostgreSQL nodes
> quorum aware to make it more robust and fault tolerant.
> 
> Currently Pgpool-II proceeds to failover the backend node as soon as the
> health check detects the failure or in case of an error occurred on the
> backend connection (when fail_over_on_backend_error is set). This is good
> enough for the standalone Pgpool-II server.
> 
> But consider the scenario where we have more than one Pgpool-II (Say
> Pgpool-A, Pgpool-B and Pgpool-C) in the cluster connected through watchdog
> and each Pgpool-II node is configured with two PostgreSQL backends (B1 and
> B2).
> 
> Now if due to some network glitch or an issue, Pgpool-A fails or loses its
> network connection with backend B1, The Pgpool-A will detect the failure
> and detach (failover) the B1 backend and also pass this information to the
> other Pgpool-II nodes (Pgpool-II B and Pgpool-II C), Although the Backend
> B1 was perfectly healthy and it was also reachable from Pgpool-B and
> Pgpool-C nodes, But still because of a network glitch between Pgpool-A and
> Backend B1, it will get detached from the cluster and the worst part is, if
> the B1 was a master PostgreSQL (in master-standby configuration), the
> Pgpool-II failover would also promote the B2 PostgreSQL node as a new
> master, hense making the way for split-brain and/or data corruptions.
> 
> So my proposal is that when the Watchdog is configured in Pgpool-II the
> backend health check of Pgpool-II should consult with other attached
> Pgpool-II nodes over the watchdog to decide if the Backend node is actually
> failed or if it is just a localized glitch/false alarm. And the failover on
> the node should only be performed, when the majority of cluster members
> agrees on the failure of nodes.
> 
> This quorum aware architecture of failover will prevents the false
> failovers and split-brain scenarios in the Backend nodes.
> 
> What are your thoughts and suggestions on this?
> 
> Thanks
> Best regards
> Muhammad Usama