[pgpool-hackers: 929] Re: Proposal to make watchdog more robust.

Muhammad Usama m.usama at gmail.com
Fri Jun 12 04:34:47 JST 2015


I have been further working on above for enhancing the pgpool-II watchdog.
Please read below for the detailed design overview document of watchdog

Terminologies used below
            Cluster is the logical entity which contains all the pgpool-II
server nodes connected by pgpool-II watchdog.

What is required by the watchdog?
The main purpose of the watchdog in pgpool-II is to provide high
availability, For this purpose the watchdog is required to ensure following.

-- Ensure only healthy nodes are part of the cluster
-- Ensure only authorized nodes can become the member of the cluster
-- Ensure only one pgpool-II node is a designated master node at any time
-- Provide the automatic recovery mechanism when possible when some problem

The watchdog should provide a guard against following types of failures
-- pgpool-II service failure
-- complete or partial network failures.

High level responsibilities of the watchdog
-- Health checking of all participating pgpool-II nodes in the cluster
including the health checking of local pgpool-II server.
-- Ensure the availability of delegate-ip always on a single node at all
the time.
-- Mechanism to add and remove pgpool-II nodes from the cluster.
-- Perform the leader election to select the master node when the cluster
is initialized or in case of master node failure.
-- Performs an automatic recovery if the due to some issue the cluster
state is broken or split-brain scenario happens
-- Generate alarms for failures where administrator intervention is
required to rectify the problem.
-- Manage the pgpool-II configurations to make sure all the nodes in the
cluster have similar configurations.
-- Provide the effective way of health checking of other nodes (heartbeat)
and messaging between participating nodes.
-- Ensure security so that only intended nodes can become the cluster
-- Provide the mechanism so that administrator can check the status of the
cluster and alarms generated by cluster.
-- Able to remove the node membership from the cluster(node fencing) if a
problematic node is detected or requested by administrator command.

Watchdog on Amazon Cloud and other cloud flavours
This is the much asked for feature that pgpool-II watchdog should work
seamlessly on AWS. So the enhanced watchdog will work on amazon cloud where
a simple virtual IP can not be used by pgpool-II watchdog. For this the
enhanced watchdog will implement two new features.

1 -- Active-Active watchdog configuration:
                          This will be a big improvement to the pgpool-II
watchdog and this would effectually mean that multiple pgpool-II servers
can be installed and external load-balancer and HA system can be used with
the pgpool-II

2 -- New watchdog will be flexible enough to allow utilities other than
ifconfig (e.g ec2-assign-private-ip-addresses for AWS virtual IP) can be
used to bring up virtual-IP

Logical Components of watchdog
The pgpool-II watchdog system will consists of following discrete logical

-- Heartbeat to monitor health and availability of cluster member nodes.
-- Messaging system, to share status and configurations between cluster
member nodes.
               ---- All the messaging will be in xml or text based
extensible protocol to ensure easy debugging and future extensions
               ---- Will provide a communication mechanism for unicast as
well as broadcast messaging
-- Local resource manager, which will have a responsibility to monitor the
health of local resources. It will consist of two sub components
               ---- delegate-IP monitoring and control
               ---- Local pgpool-II server monitoring
-- Information database, That will store and manage all the cluster wide
runtime information and pgpool-II configurations
-- IPC listener to enable administrator control by PCP commands.

Working overview of watchdog system
The new watchdog system will be a finite state machine which will transit
between different states. Some prominent systems states will be

IDLE                              -- nothing is happening

STARTING                    -- starting up

STOPING                      -- stoping

ELECTION                    -- Take part in the election

JOINING CLUSTER     -- we are initialised and joining the cluster

ELECTED MASTER     -- If the node has been just elected as the master node

NORMAL NODE           -- If we are not master and have joined the cluster
as a slave node

RECOVERY                  -- some event occurred and we are recovering from

The basic working of the watchdog will be as follows:
At startup do basic sanity checks and go into the normal member node state,
wait for the instructions from the master node or start the election
If the election algorithm is started, Participate in the elections and
become either master node or normal node, depending on election results.
Once the election is complete, if we are the master node, move to the
master waiting state and construct the complete view of member nodes and
cluster state
Construct the information database and propagate it to all member nodes.
Start the health-checking of local resources and remote nodes and stay in
this state until some failure occur. Depending upon type of failure or
event take appropriate actions.
The action could be one of the following
     -- Kill itself.
     -- Start leader election
     -- Restart a local resource (pgpool-II server or delegate-IP)
     -- Inform about some event or failure to master node (if it is not
master node)
     -- Replicate the configuration or information to the member nodes
(master node only)
     -- Perform fencing of member node (master node only)

Responsibilities of master watchdog node.
-- Maintaining the up to date configurations of pgpool-II and replicating
it to all participating nodes in the cluster
-- Health checking of backend pgpool-II nodes, And if the configuration is
in such a way that all members are required to do backend health checking,
or if the backend error is detected by some other member of cluster, then
ensure that failover of the backend node is executed only by a single node.
-- Managing the fencing, joining and leaving of members from the cluster
-- Keeping hold of delegate-IP and making sure that it is recovered back if
for some reason it is dropped.
-- Handing over the responsibility to some other cluster member if for some
issue, it is not able to continue as master node or instruct by
administrator command.

Leader election algorithm
Selecting the best algorithm for selecting the master pgpool-II node in
case of master node failure or at start-up is still a TODO, and one of the
suggestion is to use Leader Election in Asynchronous Distributed Systems
http://www.cs.indiana.edu/pub/techreports/TR521.pdf algorithm (Also used by
Other leader algorithm suggestions are most welcome

Thought, suggestions, Comments ???

Best regards
Muhammad Usama

On Mon, Mar 2, 2015 at 4:08 PM, Muhammad Usama <m.usama at gmail.com> wrote:

> Hi pgpool-II hackers,
> pgpool-II's watchdog is used to eliminate single point of failure and
> provide HA, Although current watchdog is serving the purpose but I
> think there is a need to enhance this feature and make it more robust
> and adoptable. So that it can work seamlessly in verity of scenarios
> and with different system and cloud flavours.
> Below are the few points on which I think the enhancements can be made
> to make pgpool-II more robust for high availability scenarios.
> 1-) Provide multiple options for heartbeat to check the availability
> of other pgpool-II servers.
>      a-) UDP uni-cast (Already present)
>      b-) UDP multicast, Will be helpful in reducing network traffic.
>      c-) TCP heartbeat.
> 2-)  pgpool-II running in one group, should also sync the configurations.
> I think it would be good, If multiple pgpool-II servers running in one
> group (connected to each other by watchdog), should have same
> configuration parameter values and consistent view of backend nodes.
> Doing this will also help in cases when some external IP based
> load-balancer is used to load-balance between two or more pgpool-II
> servers.
> 3-)  It may be good to offload the burden of PG backend node health
> checking from secondary pgpool-II servers and delegating it solely to
> master pgpool-II only. Which performs the backend node health
> checking, this could help in improving the performance a little.
> 4-)  If somehow a split brain scenario happens because of network
> partitioning or temporary network outage. The pgpool-II should be able
> to recover by-itself after detecting the scenario.
> 5-)  Add some way in pgpool-II to allow configurable quorum settings
> to decide how and when the pgpool-II can be escalated to master
> pgpool-II
> 6-) pgpool-II should have some configuration parameter to wait for
> configured amount of time before starting to elect new master node in
> case of master pgpool-II node failure. This could help to guard
> failover in case of temporary network glitches.
> 7) Allow to use watchdog in a configuration where watchdog master and
>    secondary cannot share the same virtual IP address (for example,
>    different regions in AWS).
> Thoughts, comments snd suggestions are most welcome.
> Thanks and regards!
> Muhammad Usama
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20150612/843ddebd/attachment.html>

More information about the pgpool-hackers mailing list