Difference between revisions of "watchdog feature enhancement"

From pgpool Wiki
Jump to: navigation, search
(Created page with "==== What's driving this enhancement ==== :Watchdog is a very important feature of pgpool-II as it is used to eliminate the single point of failure and provide HA. But there are ...")
 
(Some pgpool-II general mailing list threads related to watchdog)
Line 28: Line 28:
 
: 4-- Wants watchdog on cloud and active-active watchdog configurations
 
: 4-- Wants watchdog on cloud and active-active watchdog configurations
  
''':Still open Issues'''
+
:'''Still open Issues'''
  
 
: -- [pgpool-general: 3724] delegate ip lost
 
: -- [pgpool-general: 3724] delegate ip lost
Line 34: Line 34:
 
: -- [pgpool-general: 3228] Split brain or using 3 nodes ?
 
: -- [pgpool-general: 3228] Split brain or using 3 nodes ?
 
: -- [pgpool-general: 3728] Re: pgpool-general Digest, Vol 43, Issue 17
 
: -- [pgpool-general: 3728] Re: pgpool-general Digest, Vol 43, Issue 17
 +
 +
What is required by the watchdog?
 +
---------------------------------------------------
 +
The main purpose of the watchdog in pgpool-II is to provide high availability, For this purpose the watchdog is required to ensure following.
 +
 +
-- Ensure only healthy nodes are part of the cluster
 +
-- Ensure only authorized nodes can become the member of the cluster
 +
-- Ensure only one pgpool-II node is a designated master node at any time
 +
-- Provide the automatic recovery mechanism when possible when some problem occurs

Revision as of 15:30, 15 June 2015

What's driving this enhancement

Watchdog is a very important feature of pgpool-II as it is used to eliminate the single point of failure and provide HA. But there are few feature requests and bugs in the existing watchdog that require little more than a simple code fix, and requires the complete revisit of its core architecture. So this enhancement of watchdog is aimed at providing the stability and robustness to the existing pgpool-II watchdog with some new cool features.

Some pgpool-II general mailing list threads related to watchdog

-- [pgpool-general: 3724] delegate ip lost
-- [pgpool-II 0000135]: Delegate IP does not get up on Standby upon Active gets disconnected (same in ppgool-general: 3736)
-- Split-brain scenario due to network partitioning
-- [ppgool-general: 3595] Watchdog issue.
-- [pgpool-general: 3443] watchdog on cloud
-- [pgpool-general: 3126] watchdog voting
-- [pgpool-general: 2985] Re: Connections stuck in CLOSE_WAIT, again
-- [pgpool-general: 2949] Re: pgpool 3.3.3 watchdog problem
-- [pgpool-general: 2797] pcp_watchdog_info parameters
-- [pgpool-general: 2768] timeout Watchdog
-- [pgpool-general: 2427] watchdog quorum
-- [pgpool-general: 2418] Re: watchdog: different statuses on different pgpool nodes.
-- [pgpool-general: 3772] Race condition for VIP assignment
-- Lots of question on suid or root privileges are required ([pgpool-general: 3323] Re: Watchdog - ifconfig up failed)
-- User wants ACTIVE-ACTIVE pgpool-II configuration and miscellaneous comments on the difficulty in configuration of watchdog
Summary
Analyzing above pgpool-II community threads related to watchdog, It comes down to four main areas where current pgpool-II watchdog requires some enhancements.
1-- Related to Virtual IP assignments and handling the case of lost of VIP
2-- Split-brain scenario, recovery from it and watchdog quorum
3-- Users run into misconfigured watchdog situations very often.
4-- Wants watchdog on cloud and active-active watchdog configurations
Still open Issues
-- [pgpool-general: 3724] delegate ip lost
-- [pgpool-general: 3772] Race condition for VIP assignment
-- [pgpool-general: 3228] Split brain or using 3 nodes ?
-- [pgpool-general: 3728] Re: pgpool-general Digest, Vol 43, Issue 17

What is required by the watchdog?


The main purpose of the watchdog in pgpool-II is to provide high availability, For this purpose the watchdog is required to ensure following.

-- Ensure only healthy nodes are part of the cluster -- Ensure only authorized nodes can become the member of the cluster -- Ensure only one pgpool-II node is a designated master node at any time -- Provide the automatic recovery mechanism when possible when some problem occurs