[pgpool-committers: 2771] pgpool: Mega patch for watchdog feature enhancements

Muhammad Usama m.usama at gmail.com
Fri Oct 30 05:52:18 JST 2015


Mega patch for watchdog feature enhancements

The goal of this enhancement is to address the shortcomings and problems in the
pgpool watchdog and make the watchdog system more robust and adaptable.
The patch tries to address the following shortcomings in the watchdog.
-- The watchdog should consider the quorum and only elect the master/leader node
   if the quorum exist in the watchdog cluster.
-- All the participating pgpool-II nodes in the watchdog cluster should have
   similar pgpool-II configurations.
-- Watchdog nodes should have configurable watchdog node priority, to give users
   more control on which node should become a leader node.
-- More options for the node health-checking, especially watchdog should allow
   external/3rd party node health checking system to integrate with it.
-- The watchdog should keep looking for problems like split-brain syndrome and
   should automatically recover from it.
-- Allow user to provide scripts to be executed at time of escalation and
   de-escalation to master/leader nodes.

Some notes about the new architecture.
======================================
The new watchdog process implements a state machine, every watchdog node starts
from the "WD_LOADING" state and transit towards either "WD_STANDBY" or "WD_COORDINATOR"
state. The node stays in standby or coordinator state until some event cause the
lost of quorum. And when that happens the node goes to "WD_WAITING_FOR_QUORUM"
state and stays there until the quorum is complete again. The coordinator/master
watchdog node periodically keeps sending the beacon message to all connected nodes,
This helps to detect the split-brain scenarios.

Another state in state machine is "WD_IN_NW_TROUBLE" (in network trouble).
The node goes into this state when the watchdog finds out the network blackout
on the local machine. It keeps in this state until the network on the system is
back. When the network comes back the node leaves the WD_IN_NW_TROUBLE state and
state and goes back to WD_LOADING and starts the joining process again.

Communication with nodes in the cluster
=======================================
Watchdog uses TCP / IP sockets for all the communication with other nodes.
Each watchdog node can have two sockets opened with each node. One is the outgoing
socket which this node creates and initiate the connection to another node.
The second socket is the one which is inbound connection and initiated by remote
watchdog node. Each inbound connection remains in un-identified socket lists
until the initiator node sends the WD_ADD_NODE_MESSAGE on that socket and the
information in the message is verified against the configured watchdog nodes.
At any time only one socket on any particular node is enough to successfully
carry out the watchdog operations.

IPC and Integrating external Lifecheck with watchdog
====================================================
Watchdog uses the unix domain socket for all IPC communications. The BSD socket
file name for IPC is constructed by appending pgpool-II wd_port after
"s.PGPOOLWD_CMD." string and the socket file is placed in the wd_ipc_socket_dir
(configuration parameter) directory. This IPC socket can be used by any
external/3rd party system to get the information of configured watchdog nodes
for health-checking on and on the same socket the external system can inform the
watchdog about the changing health status of nodes.
Watchdog uses JSON data for all IPC communication that makes it easy to integrate
the external node health check systems.

Pgpool-II configuration parameter changes introduced by the patch
=================================================================
wd_ipc_socket_dir:
used to specify the directory where the UNIX domain socket accepting
pgpool-II watchdog IPC connections will be created.

wd_priority:
This new parameter can be used to elevate the current watchdog node priority in
leader elections. The node with the higher wd_priority value will get selected as
master/coordinator watchdog node when cluster will be electing its master node at
cluster startup or in the event of old master watchdog node failure

wd_de_escalation_command:
This parameter holds the command that watchdog will execute on the master
pgpool-II watchdog node when that node resigns from the master node responsibilities.

New lifecheck mode "external". This new mode disables the internal lifecheck of
pgpool-II watchdog and relies on external system to inform about node health status.

Branch
------
master

Details
-------
http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=e95c05b06283ec4c801f3ecd0f1d182ca10913cd

Modified Files
--------------
doc/pgpool-en.html                                 |  212 +-
src/Makefile.am                                    |    4 +-
src/Makefile.in                                    |   13 +-
src/config/pool_config.c                           |  146 +-
src/config/pool_config.l                           |   72 +-
src/include/pcp/pcp.h                              |   14 +-
src/include/pool.h                                 |    5 +
src/include/pool_config.h                          |   39 +-
src/include/utils/json.h                           |  310 ++
src/include/utils/json_writer.h                    |   64 +
src/include/utils/pool_stream.h                    |    5 +-
src/include/watchdog/watchdog.h                    |  230 +-
src/include/watchdog/wd_ext.h                      |  123 -
src/include/watchdog/wd_ipc_commands.h             |   80 +
src/include/watchdog/wd_ipc_defines.h              |   82 +
src/include/watchdog/wd_json_data.h                |   51 +
src/include/watchdog/wd_lifecheck.h                |   70 +
src/include/watchdog/wd_utils.h                    |   57 +
src/libs/pcp/pcp.c                                 |   15 +-
src/main/main.c                                    |    2 +-
src/main/pgpool_main.c                             |  300 +-
src/pcp_con/pcp_worker.c                           |   52 +-
src/pcp_con/recovery.c                             |    5 +-
src/sample/pgpool.conf.sample                      |   20 +-
src/sample/pgpool.conf.sample-master-slave         |   17 +-
src/sample/pgpool.conf.sample-replication          |   19 +-
src/sample/pgpool.conf.sample-stream               |   18 +-
src/test/regression/tests/004.watchdog/master.conf |   18 +-
.../regression/tests/004.watchdog/standby.conf     |   25 +-
src/test/regression/tests/004.watchdog/test.sh     |   86 +-
src/tools/pcp/pcp_frontend_client.c                |   25 +-
src/tools/pgmd5/pool_config.c                      |  154 +-
src/utils/json.c                                   | 1092 +++++
src/utils/json_writer.c                            |  260 +
src/utils/pool_process_reporting.c                 |   15 +
src/utils/pool_stream.c                            |   56 +
src/watchdog/Makefile.am                           |    9 +-
src/watchdog/Makefile.in                           |   27 +-
src/watchdog/watchdog.c                            | 5134 ++++++++++++++++++--
src/watchdog/wd_child.c                            |  508 --
src/watchdog/wd_commands.c                         |  759 +++
src/watchdog/wd_escalation.c                       |  228 +
src/watchdog/wd_heartbeat.c                        |   89 +-
src/watchdog/wd_if.c                               |  470 +-
src/watchdog/wd_init.c                             |  119 -
src/watchdog/wd_interlock.c                        |  317 +-
src/watchdog/wd_json_data.c                        |  430 ++
src/watchdog/wd_lifecheck.c                        |  847 +++-
src/watchdog/wd_list.c                             |   28 -
src/watchdog/wd_packet.c                           | 1234 -----
src/watchdog/wd_ping.c                             |   28 +-
src/watchdog/wd_utils.c                            |  201 +
52 files changed, 10744 insertions(+), 3440 deletions(-)



More information about the pgpool-committers mailing list