View Issue Details

IDProjectCategoryView StatusLast Update
0000547Pgpool-IIBugpublic2019-10-31 18:41
ReporterharukatAssigned ToMuhammad Usama 
PrioritynormalSeveritymajorReproducibilityalways
Status assignedResolutionopen 
Product Version3.7.11 
Target Version3.7.12Fixed in Version3.7.12 
Summary0000547: We need to do arping again after recovering from split-brain.
DescriptionPgpool-II should do arping again (or wd_IP_down() and wd_IP_up() again)
in the master/coordinator node after recovering from split-brain.
In the following scenario, Pgpool-II doesn't work well.

scenario:

1. There are watchdog cluster nodes: n1, n2, and n3.
    n3 is master/coordinator. VIP is set on n3.

2. Network trouble occurs.
    n1 and n2 decide that n3 is down.
   n2 become new master/coordinator, VIP is set on n2 with arping.

3. Network recovers.
    n1, n2, and n3 notice split-brain status.
    They decide n3 is best master/coordinator.
    n2 resigns master/coordinator. VIP is released on n2.

4. VIP has been set on n3, but ARP table says n2 is VIP target yet.
TagsNo tags attached.

Activities

t-ishii

2019-09-13 15:57

developer   ~0002845

Last edited: 2019-09-13 17:07

View 2 revisions

According to Usama's explanation in similar case:
https://www.pgpool.net/pipermail/pgpool-general/2019-August/006733.html
After "3. Network recovers."
> n1, n2, and n3 notice split-brain status.
> They decide n3 is best master/coordinator.
> n2 resigns master/coordinator. VIP is released on n2.
n2 should have not resigned the master in the first place. Since the other case in the upthread of the URL above looks quite similar, I guess there's something wrong in watchdog code to handle the case when former master node comes back.

So,
> We need to do arping again after recovering from split-brain.
Point is not here.

Muhammad Usama

2019-09-13 21:57

developer   ~0002847

Last edited: 2019-09-13 21:58

View 2 revisions

In the above-explained scenario. As per the current design, the watchdog should behave as follows:

1. There are watchdog cluster nodes: n1, n2, and n3.
    n3 is master/coordinator. VIP is set on n3.

2. Network trouble occurs.
        n1 and n2 decide that n3 is down.
        n2 become new master/coordinator, VIP is set on n2 with arping.

2-b. At the same time, n3 would perform de-escalation since it has lost the quorum.
         Even if the n3 stays as master it would remove the VIP because of losing the quorum.

3. Network recovers.
       n1, n2, and n3 notice split-brain status.
       They decide n3 is best master/coordinator.
      n2 resigns master/coordinator. VIP is released on n2.

4. Even in the case of No:4. If n3 is chosen as the best master. The n3 will perform
       the escalation again. The reason being it had already performed the de-escalation when the
      quorum was lost. And when the escalation is performed again after recovery from
      split-brain the arping step would be performed again.

Now if somehow (step No:4 of the original scenario: VIP has been set on n3, but ARP table says n2 is VIP target yet.)
is happening which means the watchdog had not performed the de-escalation when the quorum was lost.

If that is the case can you please provide me the log files when that happened?

harukat

2019-09-17 11:10

developer   ~0002855

This is the log for this issue.
IP address and host name are modified from our customer's original.
N3 had been master/coordinator with VIP for a long time.
When n1 and n2 realized n3 was lost, n3 didn't throw any errors for watchdog.

lost_arping_case.log (239,651 bytes)

harukat

2019-09-25 13:03

developer   ~0002879

Our customer says:
In our case, n3 did not seem to know quorum was lost...for most of that network partition,
n3 could *see* n1 and n2, but n2 and n1 could not *see* n3.
I'm not certain if n3 had a way to know quorum was lost during the network partition?
We do know with certainty that the end result was two production outages within 3 days.

harukat

2019-09-27 12:19

developer   ~0002885

This is a simple patch for V3_7_STABLE to do arping after recovering from split-brain.
It passed the regression test.

pgpool2_V3_7_STSBLE_arping_again.patch (2,390 bytes)
diff --git a/src/include/watchdog/wd_utils.h b/src/include/watchdog/wd_utils.h
index 69bdbd0..dcec758 100644
--- a/src/include/watchdog/wd_utils.h
+++ b/src/include/watchdog/wd_utils.h
@@ -57,5 +57,6 @@ extern char* wd_get_cmd(char* cmd);
 extern int create_monitoring_socket(void);
 extern bool read_interface_change_event(int sock, bool* link_event, bool* deleted);
 extern bool is_interface_up(struct ifaddrs *ifa);
+extern int wd_arping(void);
 
 #endif
diff --git a/src/watchdog/watchdog.c b/src/watchdog/watchdog.c
index 4a705e7..f47c424 100644
--- a/src/watchdog/watchdog.c
+++ b/src/watchdog/watchdog.c
@@ -5698,6 +5698,9 @@ static void handle_split_brain(WatchdogNode* otherMasterNode, WDPacketData* pkt)
 				(errmsg("We are in split brain, and I am the best candidate for master/coordinator"),
 				 errdetail("asking the remote node \"%s\" to step down",otherMasterNode->nodeName)));
 		send_cluster_service_message(otherMasterNode,pkt,CLUSTER_IAM_TRUE_MASTER);
+
+		/* do arping here again because arping could be done in another node */
+		wd_arping();
 	}
 
 }
diff --git a/src/watchdog/wd_if.c b/src/watchdog/wd_if.c
index e92f4a3..c3e3822 100644
--- a/src/watchdog/wd_if.c
+++ b/src/watchdog/wd_if.c
@@ -129,22 +129,7 @@ wd_IP_up(void)
 	}
 
 	if (rtn == WD_OK)
-	{
-		command = wd_get_cmd(pool_config->arping_cmd);
-		if (command)
-		{
-			snprintf(path,sizeof(path),"%s/%s",pool_config->arping_path,command);
-			rtn = exec_if_cmd(path,pool_config->arping_cmd);
-			pfree(command);
-		}
-		else
-		{
-			rtn = WD_NG;
-			ereport(LOG,
-				(errmsg("failed to acquire the delegate IP address"),
-					 errdetail("unable to parse the arping_cmd:\"%s\"",pool_config->arping_cmd)));
-		}
-	}
+		rtn = wd_arping();
 
 	if (rtn == WD_OK)
 	{
@@ -511,3 +496,31 @@ bool is_interface_up(struct ifaddrs *ifa)
 
 	return result;
 }
+
+
+/*
+ * Execute arping command
+ */
+int
+wd_arping()
+{
+	int rtn = WD_OK;
+	char path[WD_MAX_PATH_LEN];
+	char* command;
+	command = wd_get_cmd(pool_config->arping_cmd);
+	if (command)
+	{
+		snprintf(path,sizeof(path),"%s/%s",pool_config->arping_path,command);
+		rtn = exec_if_cmd(path,pool_config->arping_cmd);
+		pfree(command);
+	}
+	else
+	{
+		rtn = WD_NG;
+		ereport(LOG,
+			(errmsg("failed to acquire the delegate IP address"),
+				 errdetail("unable to parse the arping_cmd:\"%s\"",pool_config->arping_cmd)));
+	}
+	return rtn;
+}
+

t-ishii

2019-09-27 12:32

developer   ~0002886

The regression does not do any VIP/arp test. That means passing the regression test does not say anything about your patch is ok or not as far as VIP/arp issue concerns.

t-ishii

2019-09-27 13:14

developer   ~0002887

Have you actually tested the patch with case described above ?

harukat

2019-10-01 18:10

developer   ~0002896

No, I haven't recreate the reported situation.

t-ishii

2019-10-02 09:36

developer   ~0002898

We do not accept untested patches.

Muhammad Usama

2019-10-02 23:07

developer   ~0002899

Hi, Harukat,

First of all sorry for the delayed response, and thanks a lot for providing the patch and log files.

The log file contained the messages from all three nodes and it was quite huge so took me a while to understand
the issue.

Anyhow after reviewing the log files I realized that there was confusion in the watchdog code on how to deal
with the life-check failed scenarios especially for the cases when the life-check reports the node failure while
watchdog core still able to communicate with remote nodes. and also for the case when node A's life-check reports
node B as lost while B still thinks A is alive and healthy.

So I have reviewed the whole watchdog design around the life-check reports and have made some fixes in that area
in the attached patch

You can try the attached patch and test your scenario if you can still find yourself in the same situation as described
in the initial bug report.
The patch is generated against the current MASTER branch and I will commit it after little more testing and then backport
it to all supported branches.

Finally, the original idea in your patch and bug report to do arping after recovering from the split-brain seems reasonable
but your patch needs a little more thought on the design, since executing the wd_arping function from watchdog main process
is not the right thing to do.
But effectively after my patch (attached) you should never end up in the situation where
multiple watchdog nodes ended up performing the escalation provided you are using odd number of pgpool-II nodes.
The arping solution you suggested should only be effective in a situation where the watchdog cluster is configured with
even number of total nodes and network partition divides the network in such a way that both network partitions gets exactly
half nodes each.

Thanks
Best regards
Muhammad Usama
]

watchdog_node_lost_fix.diff (54,091 bytes)
diff --git a/src/include/watchdog/watchdog.h b/src/include/watchdog/watchdog.h
index 1146a4ae..83f6651b 100644
--- a/src/include/watchdog/watchdog.h
+++ b/src/include/watchdog/watchdog.h
@@ -79,7 +79,8 @@ typedef enum
 	WD_IN_NW_TROUBLE,
 	/* the following states are only valid on remote nodes */
 	WD_SHUTDOWN,
-	WD_ADD_MESSAGE_SENT
+	WD_ADD_MESSAGE_SENT,
+	WD_NETWORK_ISOLATION
 }			WD_STATES;
 
 typedef enum
@@ -114,9 +115,21 @@ typedef enum
 	WD_EVENT_NODE_CON_LOST,
 	WD_EVENT_NODE_CON_FOUND,
 	WD_EVENT_CLUSTER_QUORUM_CHANGED,
-	WD_EVENT_WD_STATE_REQUIRE_RELOAD
+	WD_EVENT_WD_STATE_REQUIRE_RELOAD,
+	WD_EVENT_I_AM_APPEARING_LOST,
+	WD_EVENT_I_AM_APPEARING_FOUND
 }			WD_EVENTS;
 
+typedef enum {
+	NODE_LOST_UNKNOWN_REASON,
+	NODE_LOST_BY_LIFECHECK,
+	NODE_LOST_BY_SEND_FAILURE,
+	NODE_LOST_BY_MISSING_BEACON,
+	NODE_LOST_BY_RECEIVE_TIMEOUT,
+	NODE_LOST_BY_NOT_REACHABLE,
+	NODE_LOST_SHUTDOWN
+} WD_NODE_LOST_REASONS;
+
 typedef struct SocketConnection
 {
 	int			sock;			/* socket descriptor */
@@ -135,6 +148,20 @@ typedef struct WatchdogNode
 									 * from the node */
 	struct timeval last_sent_time;	/* timestamp when last packet was sent on
 									 * the node */
+	bool   has_lost_us;             /*
+									 * True when this remote node thinks
+									 * we are lost
+									 */
+	int    sending_failures_count;  /* number of times we have failed
+									 * to send message to the node.
+									 * Gets reset after successfull sent
+									 */
+	int    missed_beacon_count;     /* number of times the node has
+									 * failed to reply for beacon.
+									 * message
+									 */
+	WD_NODE_LOST_REASONS node_lost_reason;
+
 	char		pgp_version[MAX_VERSION_STR_LEN];		/* Pgpool-II version */
 	int			wd_data_major_version;	/* watchdog messaging version major*/
 	int			wd_data_minor_version;  /* watchdog messaging version minor*/
diff --git a/src/test/regression/regress.sh b/src/test/regression/regress.sh
index a891c2ae..37efee02 100755
--- a/src/test/regression/regress.sh
+++ b/src/test/regression/regress.sh
@@ -38,7 +38,7 @@ function install_pgpool
 
 	test -d $log || mkdir $log
         
-	make install HEALTHCHECK_DEBUG=1 -C $dir/../../ -e prefix=${PGPOOL_PATH} >& regression.log 2>&1
+	make install HEALTHCHECK_DEBUG=1 WATCHDOG_DEBUG=1 -C $dir/../../ -e prefix=${PGPOOL_PATH} >& regression.log 2>&1
 
 	if [ $? != 0 ];then
 	    echo "make install failed"
diff --git a/src/watchdog/Makefile.am b/src/watchdog/Makefile.am
index bb4c2204..0400459f 100644
--- a/src/watchdog/Makefile.am
+++ b/src/watchdog/Makefile.am
@@ -1,6 +1,6 @@
 top_builddir = ../..
 AM_CPPFLAGS = -D_GNU_SOURCE -I @PGSQL_INCLUDE_DIR@
-
+WATCHDOG_DEBUG=0
 noinst_LIBRARIES = lib-watchdog.a
 
 lib_watchdog_a_SOURCES = \
@@ -16,3 +16,4 @@ lib_watchdog_a_SOURCES = \
 	wd_utils.c \
 	wd_escalation.c
 
+DEFS = @DEFS@ -DWATCHDOG_DEBUG_OPTS=$(WATCHDOG_DEBUG)
diff --git a/src/watchdog/Makefile.in b/src/watchdog/Makefile.in
index 891d6058..103dd318 100644
--- a/src/watchdog/Makefile.in
+++ b/src/watchdog/Makefile.in
@@ -189,7 +189,7 @@ COLLATEINDEX = @COLLATEINDEX@
 CPP = @CPP@
 CPPFLAGS = @CPPFLAGS@
 CYGPATH_W = @CYGPATH_W@
-DEFS = @DEFS@
+DEFS = @DEFS@ -DWATCHDOG_DEBUG_OPTS=$(WATCHDOG_DEBUG)
 DLLTOOL = @DLLTOOL@
 DOCBOOKSTYLE = @DOCBOOKSTYLE@
 DSYMUTIL = @DSYMUTIL@
@@ -312,6 +312,7 @@ top_build_prefix = @top_build_prefix@
 top_builddir = ../..
 top_srcdir = @top_srcdir@
 AM_CPPFLAGS = -D_GNU_SOURCE -I @PGSQL_INCLUDE_DIR@
+WATCHDOG_DEBUG = 0
 noinst_LIBRARIES = lib-watchdog.a
 lib_watchdog_a_SOURCES = \
 	watchdog.c \
diff --git a/src/watchdog/watchdog.c b/src/watchdog/watchdog.c
index 0a0e81ef..5793a6bd 100644
--- a/src/watchdog/watchdog.c
+++ b/src/watchdog/watchdog.c
@@ -89,6 +89,15 @@ typedef enum IPC_CMD_PREOCESS_RES
 #define	MAX_SECS_WAIT_FOR_REPLY_FROM_NODE	5	/* time in seconds to wait for
 												 * the reply from remote
 												 * watchdog node */
+
+#define MAX_ALLOWED_SEND_FAILURES           3	/* number of times sending message failure
+                                                  * can be tolerated
+                                                  */
+#define MAX_ALLOWED_BEACON_REPLY_MISS       3	/* number of times missing beacon message reply
+                                                  * can be tolerated
+                                                  */
+
+
 #define	FAILOVER_COMMAND_FINISH_TIMEOUT		15	/* timeout in seconds to wait
 												 * for Pgpool-II to build
 												 * consensus for failover */
@@ -129,6 +138,8 @@ typedef enum IPC_CMD_PREOCESS_RES
 #define CLUSTER_IAM_RESIGNING_FROM_MASTER	'R'
 #define CLUSTER_NODE_INVALID_VERSION		'V'
 #define CLUSTER_NODE_REQUIRE_TO_RELOAD		'I'
+#define CLUSTER_NODE_APPEARING_LOST 		'Y'
+#define CLUSTER_NODE_APPEARING_FOUND 		'Z'
 
 #define WD_MASTER_NODE getMasterWatchdogNode()
 
@@ -199,7 +210,8 @@ char	   *wd_event_name[] =
 	"NODE CONNECTION LOST",
 	"NODE CONNECTION FOUND",
 	"CLUSTER QUORUM STATUS CHANGED",
-	"NODE REQUIRE TO RELOAD STATE"
+	"NODE REQUIRE TO RELOAD STATE",
+	"I AM APPEARING LOST"
 };
 
 char	   *wd_state_names[] = {
@@ -214,7 +226,18 @@ char	   *wd_state_names[] = {
 	"LOST",
 	"IN NETWORK TROUBLE",
 	"SHUTDOWN",
-	"ADD MESSAGE SENT"
+	"ADD MESSAGE SENT",
+	"NETWORK ISOLATION"
+};
+
+char *wd_node_lost_reasons[] = {
+	"UNKNOWN REASON",
+	"REPORTED BY LIFECHECK",
+	"SEND MESSAGE FAILURES",
+	"MISSING BEACON REPLIES",
+	"RECEIVE TIMEOUT",
+	"NOT REACHABLE",
+	"SHUTDOWN"
 };
 
 typedef struct WDPacketData
@@ -353,6 +376,22 @@ typedef struct WDFailoverObject
 	int			state;
 }			WDFailoverObject;
 
+#ifdef WATCHDOG_DEBUG_OPTS
+#if WATCHDOG_DEBUG_OPTS > 0
+#define WATCHDOG_DEBUG
+#endif
+#endif
+
+static bool check_debug_request_do_not_send_beacon(void);
+static bool check_debug_request_do_not_reply_beacon(void);
+static bool check_debug_request_kill_all_communication(void);
+static bool check_debug_request_kill_all_receivers(void);
+static bool check_debug_request_kill_all_senders(void);
+
+
+#ifdef WATCHDOG_DEBUG
+static void load_watchdog_debug_test_option(void);
+#endif
 
 static void process_remote_failover_command_on_coordinator(WatchdogNode * wdNode, WDPacketData * pkt);
 static WDFailoverObject * get_failover_object(POOL_REQUEST_KIND reqKind, int nodesCount, int *nodeList);
@@ -422,7 +461,7 @@ static void service_internal_command(void);
 static unsigned int get_next_commandID(void);
 static WatchdogNode * parse_node_info_message(WDPacketData * pkt, char **authkey);
 static void update_quorum_status(void);
-static int	get_mimimum_remote_nodes_required_for_quorum(void);
+static int	get_minimum_remote_nodes_required_for_quorum(void);
 static int	get_minimum_votes_to_resolve_consensus(void);
 
 static bool write_packet_to_socket(int sock, WDPacketData * pkt, bool ipcPacket);
@@ -462,6 +501,7 @@ static int	watchdog_state_machine_joining(WD_EVENTS event, WatchdogNode * wdNode
 static int	watchdog_state_machine_loading(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pkt, WDCommandData * clusterCommand);
 static int	watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pkt, WDCommandData * clusterCommand);
 static int	watchdog_state_machine_nw_error(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pkt, WDCommandData * clusterCommand);
+static int watchdog_state_machine_nw_isolation(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pkt, WDCommandData * clusterCommand);
 
 static int	I_am_master_and_cluser_in_split_brain(WatchdogNode * otherMasterNode);
 static void handle_split_brain(WatchdogNode * otherMasterNode, WDPacketData * pkt);
@@ -530,6 +570,7 @@ static void set_cluster_master_node(WatchdogNode * wdNode);
 static void clear_standby_nodes_list(void);
 static int	standby_node_left_cluster(WatchdogNode * wdNode);
 static int	standby_node_join_cluster(WatchdogNode * wdNode);
+static void update_missed_beacon_count(WDCommandData* ipcCommand, bool clear);
 
 /* global variables */
 wd_cluster	g_cluster;
@@ -1171,6 +1212,9 @@ watchdog_main(void)
 				g_timeout_sec = 0;
 			}
 		}
+#ifdef WATCHDOG_DEBUG
+		load_watchdog_debug_test_option();
+#endif
 		if (select_ret > 0)
 		{
 			int			processed_fds = 0;
@@ -1410,10 +1454,14 @@ read_sockets(fd_set *rmask, int pending_fds_count)
 
 				if (pkt)
 				{
-					watchdog_state_machine(WD_EVENT_PACKET_RCV, wdNode, pkt, NULL);
-					/* since a packet is received reset last sent time */
-					wdNode->last_sent_time.tv_sec = 0;
-					wdNode->last_sent_time.tv_usec = 0;
+					if (check_debug_request_kill_all_communication() == false &&
+						check_debug_request_kill_all_receivers() == false)
+					{
+						watchdog_state_machine(WD_EVENT_PACKET_RCV, wdNode, pkt, NULL);
+						/* since a packet is received reset last sent time */
+						wdNode->last_sent_time.tv_sec = 0;
+						wdNode->last_sent_time.tv_usec = 0;
+					}
 					free_packet(pkt);
 				}
 				else
@@ -1437,10 +1485,14 @@ read_sockets(fd_set *rmask, int pending_fds_count)
 
 				if (pkt)
 				{
-					watchdog_state_machine(WD_EVENT_PACKET_RCV, wdNode, pkt, NULL);
-					/* since a packet is received reset last sent time */
-					wdNode->last_sent_time.tv_sec = 0;
-					wdNode->last_sent_time.tv_usec = 0;
+					if (check_debug_request_kill_all_communication() == false &&
+						check_debug_request_kill_all_receivers() == false)
+					{
+						watchdog_state_machine(WD_EVENT_PACKET_RCV, wdNode, pkt, NULL);
+						/* since a packet is received reset last sent time */
+						wdNode->last_sent_time.tv_sec = 0;
+						wdNode->last_sent_time.tv_usec = 0;
+					}
 					free_packet(pkt);
 				}
 				else
@@ -1470,6 +1522,7 @@ read_sockets(fd_set *rmask, int pending_fds_count)
 			pkt = read_packet_of_type(conn, WD_ADD_NODE_MESSAGE);
 			if (pkt)
 			{
+				struct timeval 	previous_startup_time;
 				char	   *authkey = NULL;
 				WatchdogNode *tempNode = parse_node_info_message(pkt, &authkey);
 
@@ -1486,6 +1539,7 @@ read_sockets(fd_set *rmask, int pending_fds_count)
 					/* verify this node */
 					if (authenticated)
 					{
+						WD_STATES oldNodeState = WD_DEAD;
 						for (i = 0; i < g_cluster.remoteNodeCount; i++)
 						{
 							wdNode = &(g_cluster.remoteNodes[i]);
@@ -1495,6 +1549,9 @@ read_sockets(fd_set *rmask, int pending_fds_count)
 							{
 								/* We have found the match */
 								found = true;
+								previous_startup_time.tv_sec = wdNode->startup_time.tv_sec;
+								oldNodeState = wdNode->state;
+
 								close_socket_connection(&wdNode->server_socket);
 								strlcpy(wdNode->delegate_ip, tempNode->delegate_ip, WD_MAX_HOST_NAMELEN);
 								strlcpy(wdNode->nodeName, tempNode->nodeName, WD_MAX_HOST_NAMELEN);
@@ -1528,7 +1585,35 @@ read_sockets(fd_set *rmask, int pending_fds_count)
 											   wdNode->wd_data_major_version,
 											   wdNode->wd_data_minor_version)));
 
-							watchdog_state_machine(WD_EVENT_PACKET_RCV, wdNode, pkt, NULL);
+							if (oldNodeState == WD_SHUTDOWN)
+							{
+								ereport(LOG,
+										(errmsg("The newly joined node:\"%s\" had left the cluster because it was shutdown",wdNode->nodeName)));
+								watchdog_state_machine(WD_EVENT_PACKET_RCV, wdNode, pkt, NULL);
+
+							}
+							else if (oldNodeState == WD_LOST)
+							{
+								ereport(LOG,
+										(errmsg("The newly joined node:\"%s\" had left the cluster because it was lost",wdNode->nodeName),
+										 errdetail("lost reason was \"%s\" and startup time diff = %d",
+												   wd_node_lost_reasons[wdNode->node_lost_reason],
+												   abs((int)(previous_startup_time.tv_sec - wdNode->startup_time.tv_sec)))));
+
+								if (abs((int)(previous_startup_time.tv_sec - wdNode->startup_time.tv_sec)) <= 2 &&
+									wdNode->node_lost_reason == NODE_LOST_BY_LIFECHECK)
+								{
+									ereport(LOG,
+										(errmsg("node:\"%s\" was reported lost by the lifecheck process",wdNode->nodeName),
+											 errdetail("only lifecheck process can mark this node alive again")));
+									/* restore the node's lost state */
+									wdNode->state = oldNodeState;
+								}
+								else
+									watchdog_state_machine(WD_EVENT_PACKET_RCV, wdNode, pkt, NULL);
+
+							}
+
 						}
 						else
 							ereport(NOTICE,
@@ -1721,6 +1806,12 @@ write_ipc_command_with_result_data(WDCommandData * ipcCommand, char type, char *
 				(errmsg("not replying to IPC, Invalid IPC command.")));
 		return false;
 	}
+	/* DEBUG AID */
+	if (ipcCommand->commandSource == COMMAND_SOURCE_REMOTE &&
+		(check_debug_request_kill_all_senders() ||
+		check_debug_request_kill_all_communication()))
+		return false;
+
 	return write_packet_to_socket(ipcCommand->sourceIPCSocket, &pkt, true);
 }
 
@@ -1932,7 +2023,7 @@ static IPC_CMD_PREOCESS_RES process_IPC_get_runtime_variable_value_request(WDCom
 		json_value_free(root);
 		ereport(NOTICE,
 				(errmsg("failed to process get local variable IPC command"),
-				 errdetail("unable to parse json data")));
+				 errdetail("unable to parse JSON data")));
 		return IPC_CMD_ERROR;
 	}
 
@@ -2031,7 +2122,7 @@ static IPC_CMD_PREOCESS_RES process_IPC_nodeStatusChange_command(WDCommandData *
 	{
 		ereport(NOTICE,
 				(errmsg("failed to process NODE STATE CHANGE IPC command"),
-				 errdetail("unable to parse json data")));
+				 errdetail("unable to parse JSON data")));
 		return IPC_CMD_ERROR;
 	}
 
@@ -2086,7 +2177,10 @@ fire_node_status_event(int nodeID, int nodeStatus)
 		if (wdNode == g_cluster.localNode)
 			watchdog_state_machine(WD_EVENT_LOCAL_NODE_LOST, wdNode, NULL, NULL);
 		else
+		{
+			wdNode->node_lost_reason = NODE_LOST_BY_LIFECHECK;
 			watchdog_state_machine(WD_EVENT_REMOTE_NODE_LOST, wdNode, NULL, NULL);
+		}
 	}
 	else if (nodeStatus == WD_LIFECHECK_NODE_STATUS_ALIVE)
 	{
@@ -2188,7 +2282,7 @@ service_expired_failovers(void)
 										BACKEND_INFO(node_id).role == ROLE_PRIMARY)
 									{
 										ereport(LOG,
-												(errmsg("We are not able to build consensus for our primary node failover request, got %d votesonly for failover request ID:%d", failoverObj->request_count, failoverObj->failoverID),
+												(errmsg("We are not able to build consensus for our primary node failover request, got %d votes only for failover request ID:%d", failoverObj->request_count, failoverObj->failoverID),
 												 errdetail("resigning from the coordinator")));
 										need_to_resign = true;
 									}
@@ -2790,7 +2884,7 @@ static IPC_CMD_PREOCESS_RES process_IPC_failover_indication(WDCommandData * ipcC
 			{
 				ereport(LOG,
 						(errmsg("unable to process failover indication"),
-						 errdetail("invalid json data in command packet")));
+						 errdetail("invalid JSON data in command packet")));
 				res = FAILOVER_RES_INVALID_FUNCTION;
 			}
 			if (root)
@@ -2801,7 +2895,7 @@ static IPC_CMD_PREOCESS_RES process_IPC_failover_indication(WDCommandData * ipcC
 		{
 			ereport(LOG,
 					(errmsg("unable to process failover indication"),
-					 errdetail("invalid json data in command packet")));
+					 errdetail("invalid JSON data in command packet")));
 			res = FAILOVER_RES_INVALID_FUNCTION;
 		}
 		else if (failoverState == 0)	/* start */
@@ -2844,7 +2938,7 @@ failover_start_indication(WDCommandData * ipcCommand)
 	}
 	else if (get_local_node_state() == WD_STANDBY)
 	{
-		/* The node might be performing the locl quarantine opetaion */
+		/* The node might be performing the local quarantine operation */
 		ereport(DEBUG1,
 				(errmsg("main process is starting the local quarantine operation")));
 		return FAILOVER_RES_PROCEED;
@@ -2871,7 +2965,7 @@ failover_end_indication(WDCommandData * ipcCommand)
 	}
 	else if (get_local_node_state() == WD_STANDBY)
 	{
-		/* The node might be performing the locl quarantine opetaion */
+		/* The node might be performing the local quarantine operation */
 		ereport(DEBUG1,
 				(errmsg("main process is ending the local quarantine operation")));
 		return FAILOVER_RES_PROCEED;
@@ -3250,7 +3344,7 @@ update_successful_outgoing_cons(fd_set *wmask, int pending_fds_count)
 				{
 					ereport(DEBUG1,
 							(errmsg("error in outbound connection to %s:%d ", wdNode->hostname, wdNode->wd_port),
-							 errdetail("getsockopt faile with error \"%s\"", strerror(errno))));
+							 errdetail("getsockopt failed with error \"%s\"", strerror(errno))));
 					close_socket_connection(&wdNode->client_socket);
 					wdNode->client_socket.sock_state = WD_SOCK_ERROR;
 
@@ -3775,13 +3869,34 @@ cluster_service_message_processor(WatchdogNode * wdNode, WDPacketData * pkt)
 		}
 			break;
 
+		case CLUSTER_NODE_APPEARING_LOST:
+		{
+			ereport(LOG,
+				(errmsg("remote node \"%s\" is reporting that it has lost us",
+							wdNode->nodeName)));
+			wdNode->has_lost_us = true;
+			watchdog_state_machine(WD_EVENT_I_AM_APPEARING_LOST, wdNode, NULL, NULL);
+		}
+			break;
+
+		case CLUSTER_NODE_APPEARING_FOUND:
+		{
+			ereport(LOG,
+				(errmsg("remote node \"%s\" is reporting that it has found us again",
+							wdNode->nodeName)));
+			wdNode->has_lost_us = false;
+			watchdog_state_machine(WD_EVENT_I_AM_APPEARING_FOUND, wdNode, NULL, NULL);
+		}
+			break;
+
 		case CLUSTER_NODE_INVALID_VERSION:
 			{
 				/*
 				 * this should never happen means something is seriously wrong
 				 */
 				ereport(FATAL,
-						(errmsg("\"%s\" node has found serious issues in our watchdog messages",
+						(return_code(POOL_EXIT_FATAL),
+						 errmsg("\"%s\" node has found serious issues in our watchdog messages",
 								wdNode->nodeName),
 						 errdetail("shutting down")));
 			}
@@ -3846,16 +3961,6 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt)
 			}
 			break;
 
-		case WD_EVENT_REMOTE_NODE_FOUND:
-			{
-				ereport(LOG,
-						(errmsg("remote node \"%s\" became reachable again", wdNode->nodeName),
-						 errdetail("requesting the node info")));
-				send_message_of_type(wdNode, WD_REQ_INFO_MESSAGE, NULL);
-				break;
-			}
-			break;
-
 		case WD_ADD_NODE_MESSAGE:
 		case WD_REQ_INFO_MESSAGE:
 			replyPkt = get_mynode_info_message(pkt);
@@ -3955,6 +4060,33 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt)
 				{
 					standby_node_left_cluster(wdNode);
 				}
+				if (oldNodeState == WD_LOST)
+				{
+					/*
+					 * We have received the message from lost node
+					 * add it back to cluster if it was not marked by
+					 * life-check
+					 * Node lost by life-check processes can only be
+					 * added back when we get alive notification for the
+					 * node from life-check
+					 */
+					ereport(LOG,
+						(errmsg("we have received the NODE INFO message from the node:\"%s\" that was lost",wdNode->nodeName),
+						 errdetail("we had lost this node because of \"%s\"",wd_node_lost_reasons[wdNode->node_lost_reason])));
+
+					if (wdNode->node_lost_reason == NODE_LOST_BY_LIFECHECK)
+					{
+						ereport(LOG,
+							(errmsg("node:\"%s\" was reported lost by the lifecheck process",wdNode->nodeName),
+								 errdetail("only life-check process can mark this node alive again")));
+						/* restore the node's lost state */
+						wdNode->state = oldNodeState;
+					}
+					else
+					{
+						watchdog_state_machine(WD_EVENT_REMOTE_NODE_FOUND, wdNode, NULL, NULL);
+					}
+				}
 			}
 			break;
 
@@ -3989,11 +4121,16 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt)
 
 					send_cluster_service_message(NULL, pkt, CLUSTER_IN_SPLIT_BRAIN);
 				}
-				else
+				else if (WD_MASTER_NODE != NULL)
 				{
 					replyPkt = get_mynode_info_message(pkt);
 					beacon_message_received_from_node(wdNode, pkt);
 				}
+				/*
+				 * if (WD_MASTER_NODE == NULL)
+				 * do not reply to beacon if we are not connected to
+				 * any master node
+				 */
 			}
 			break;
 
@@ -4014,6 +4151,10 @@ standard_packet_processor(WatchdogNode * wdNode, WDPacketData * pkt)
 static bool
 send_message_to_connection(SocketConnection * conn, WDPacketData * pkt)
 {
+	if (check_debug_request_kill_all_communication() == true ||
+		check_debug_request_kill_all_senders() == true)
+		return false;
+
 	if (conn->sock > 0 && conn->sock_state == WD_SOCK_CONNECTED)
 	{
 		if (write_packet_to_socket(conn->sock, pkt, false) == true)
@@ -4040,6 +4181,8 @@ send_message_to_node(WatchdogNode * wdNode, WDPacketData * pkt)
 	}
 	if (ret)
 	{
+		/* reset the sending error counter */
+		wdNode->sending_failures_count = 0;
 		/* we only update the last sent time if reply for packet is expected */
 		switch (pkt->type)
 		{
@@ -4054,6 +4197,7 @@ send_message_to_node(WatchdogNode * wdNode, WDPacketData * pkt)
 	}
 	else
 	{
+		wdNode->sending_failures_count++;
 		ereport(DEBUG1,
 				(errmsg("sending packet %c to node \"%s\" failed", pkt->type, wdNode->nodeName)));
 	}
@@ -4123,7 +4267,7 @@ static IPC_CMD_PREOCESS_RES wd_command_processor_for_node_lost_event(WDCommandDa
 				if (nodeResult->cmdState == COMMAND_STATE_SENT)
 				{
 					ereport(LOG,
-							(errmsg("remote node \"%s\" lost while ipc command was in progress ", wdLostNode->nodeName)));
+							(errmsg("remote node \"%s\" lost while IPC command was in progress ", wdLostNode->nodeName)));
 
 					/*
 					 * since the node is lost and will be removed from the
@@ -4342,15 +4486,36 @@ service_unreachable_nodes(void)
 							(errmsg("remote node \"%s\" is not replying..", wdNode->nodeName),
 							 errdetail("marking the node as lost")));
 					/* mark the node as lost */
+					wdNode->node_lost_reason = NODE_LOST_BY_RECEIVE_TIMEOUT;
 					watchdog_state_machine(WD_EVENT_REMOTE_NODE_LOST, wdNode, NULL, NULL);
 				}
 			}
+			else if (wdNode->sending_failures_count > MAX_ALLOWED_SEND_FAILURES)
+			{
+				ereport(LOG,
+						(errmsg("not able to send messages to remote node \"%s\"",wdNode->nodeName),
+						 errdetail("marking the node as lost")));
+				/* mark the node as lost */
+				wdNode->node_lost_reason = NODE_LOST_BY_SEND_FAILURE;
+				watchdog_state_machine(WD_EVENT_REMOTE_NODE_LOST, wdNode, NULL, NULL);
+			}
+			else if (wdNode->missed_beacon_count > MAX_ALLOWED_BEACON_REPLY_MISS)
+			{
+				ereport(LOG,
+						(errmsg("remote node \"%s\" is not responding to our beacon messages",wdNode->nodeName),
+						 errdetail("marking the node as lost")));
+				/* mark the node as lost */
+				wdNode->node_lost_reason = NODE_LOST_BY_MISSING_BEACON;
+				wdNode->missed_beacon_count = 0; /* Reset the counter */
+				watchdog_state_machine(WD_EVENT_REMOTE_NODE_LOST, wdNode, NULL, NULL);
+			}
 		}
 		else
 		{
 			ereport(LOG,
 					(errmsg("remote node \"%s\" is not reachable", wdNode->nodeName),
 					 errdetail("marking the node as lost")));
+			wdNode->node_lost_reason = NODE_LOST_BY_NOT_REACHABLE;
 			watchdog_state_machine(WD_EVENT_REMOTE_NODE_LOST, wdNode, NULL, NULL);
 		}
 	}
@@ -4379,8 +4544,6 @@ watchdog_internal_command_packet_processor(WatchdogNode * wdNode, WDPacketData *
 	for (i = 0; i < g_cluster.remoteNodeCount; i++)
 	{
 		WDCommandNodeResult *nodeRes = &clusterCommand->nodeResults[i];
-
-		clear_command_node_result(nodeRes);
 		if (nodeRes->wdNode == wdNode)
 		{
 			nodeResult = nodeRes;
@@ -4523,7 +4686,7 @@ issue_watchdog_internal_command(WatchdogNode * wdNode, WDPacketData * pkt, int t
 				if (send_message_to_node(nodeResult->wdNode, pkt) == false)
 				{
 					ereport(DEBUG1,
-							(errmsg("failed to send watchdog internla command packet %s", nodeResult->wdNode->nodeName),
+							(errmsg("failed to send watchdog internal command packet %s", nodeResult->wdNode->nodeName),
 							 errdetail("saving the packet. will try to resend it if connection recovers")));
 
 					/* failed to send. May be try again later */
@@ -4917,9 +5080,6 @@ watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pk
 
 	if (event == WD_EVENT_REMOTE_NODE_LOST)
 	{
-		/* close all socket connections to the node */
-		close_socket_connection(&wdNode->client_socket);
-		close_socket_connection(&wdNode->server_socket);
 
 		if (wdNode->state == WD_SHUTDOWN)
 		{
@@ -4931,6 +5091,8 @@ watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pk
 			wdNode->state = WD_LOST;
 			ereport(LOG,
 					(errmsg("remote node \"%s\" is lost", wdNode->nodeName)));
+			/* Inform the node, that it is lost for us */
+			 send_cluster_service_message(wdNode, pkt, CLUSTER_NODE_APPEARING_LOST);
 		}
 		if (wdNode == WD_MASTER_NODE)
 		{
@@ -4939,11 +5101,29 @@ watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pk
 			set_cluster_master_node(NULL);
 		}
 
+		/* close all socket connections to the node */
+		close_socket_connection(&wdNode->client_socket);
+		close_socket_connection(&wdNode->server_socket);
+
 		/* clear the wait timer on the node */
 		wdNode->last_sent_time.tv_sec = 0;
 		wdNode->last_sent_time.tv_usec = 0;
+		wdNode->sending_failures_count = 0;
 		node_lost_while_ipc_command(wdNode);
 	}
+	else if (event == WD_EVENT_REMOTE_NODE_FOUND)
+	{
+		ereport(LOG,
+				(errmsg("remote node \"%s\" became reachable again", wdNode->nodeName),
+				 errdetail("requesting the node info")));
+		/*
+		 * remove the lost state from the node
+		 * and change it to joining for now
+		 */
+		wdNode->node_lost_reason = NODE_LOST_UNKNOWN_REASON;
+		wdNode->state = WD_LOADING;
+		send_cluster_service_message(wdNode, pkt, CLUSTER_NODE_APPEARING_FOUND);
+	}
 	else if (event == WD_EVENT_PACKET_RCV)
 	{
 		print_packet_node_info(pkt, wdNode, false);
@@ -4958,6 +5138,7 @@ watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pk
 		if (pkt->type == WD_INFORM_I_AM_GOING_DOWN)
 		{
 			wdNode->state = WD_SHUTDOWN;
+			wdNode->node_lost_reason = NODE_LOST_SHUTDOWN;
 			return watchdog_state_machine(WD_EVENT_REMOTE_NODE_LOST, wdNode, NULL, NULL);
 		}
 
@@ -4982,7 +5163,7 @@ watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pk
 		if (any_interface_available() == false)
 		{
 			ereport(WARNING,
-					(errmsg("network event has occured and all monitored interfaces are down"),
+					(errmsg("network event has occurred and all monitored interfaces are down"),
 					 errdetail("changing the state to in network trouble")));
 
 			set_state(WD_IN_NW_TROUBLE);
@@ -5021,7 +5202,7 @@ watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pk
 	else if (event == WD_EVENT_LOCAL_NODE_LOST)
 	{
 		ereport(WARNING,
-				(errmsg("watchdog lifecheck reported, we are disconnected from the network"),
+				(errmsg("watchdog life-check reported, we are disconnected from the network"),
 				 errdetail("changing the state to LOST")));
 		set_state(WD_LOST);
 	}
@@ -5056,6 +5237,9 @@ watchdog_state_machine(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pk
 		case WD_IN_NW_TROUBLE:
 			watchdog_state_machine_nw_error(event, wdNode, pkt, clusterCommand);
 			break;
+		case WD_NETWORK_ISOLATION:
+			watchdog_state_machine_nw_isolation(event, wdNode, pkt, clusterCommand);
+			break;
 		default:
 			/* Should never ever happen */
 			ereport(WARNING,
@@ -5111,7 +5295,7 @@ watchdog_state_machine_loading(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
 					case WD_STAND_FOR_COORDINATOR_MESSAGE:
 						{
 							/*
-							 * We are loading but a note is already contesting
+							 * We are loading but a node is already contesting
 							 * for coordinator node well we can ignore it but
 							 * then this could eventually mean a lower
 							 * priority node can became a coordinator node. So
@@ -5368,8 +5552,10 @@ watchdog_state_machine_standForCord(WD_EVENTS event, WatchdogNode * wdNode, WDPa
 							if (pkt->type == WD_ERROR_MESSAGE)
 							{
 								ereport(LOG,
-										(errmsg("our stand for coordinator request is rejected by node \"%s\"", wdNode->nodeName)));
-								set_state(WD_JOINING);
+										(errmsg("our stand for coordinator request is rejected by node \"%s\"",wdNode->nodeName),
+										 errdetail("we might be in partial network isolation and cluster already have a valid master"),
+										 errhint("please verify the watchdog life-check and network is working properly")));
+								set_state(WD_NETWORK_ISOLATION);
 							}
 							else if (pkt->type == WD_REJECT_MESSAGE)
 							{
@@ -5474,6 +5660,7 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac
 
 				send_cluster_command(NULL, WD_DECLARE_COORDINATOR_MESSAGE, 4);
 				set_timeout(MAX_SECS_WAIT_FOR_REPLY_FROM_NODE);
+				update_missed_beacon_count(NULL,true);
 				ereport(LOG,
 						(errmsg("I am announcing my self as master/coordinator watchdog node")));
 
@@ -5546,6 +5733,8 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac
 
 				else if (clusterCommand->commandPacket.type == WD_IAM_COORDINATOR_MESSAGE)
 				{
+					update_missed_beacon_count(clusterCommand,false);
+
 					if (clusterCommand->commandStatus == COMMAND_FINISHED_ALL_REPLIED)
 					{
 						ereport(DEBUG1,
@@ -5669,11 +5858,49 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac
 
 		case WD_EVENT_TIMEOUT:
 			{
-				send_cluster_command(NULL, WD_IAM_COORDINATOR_MESSAGE, 5);
+				if (check_debug_request_do_not_send_beacon() == false)
+					send_cluster_command(NULL, WD_IAM_COORDINATOR_MESSAGE, 5);
 				set_timeout(BEACON_MESSAGE_INTERVAL_SECONDS);
 			}
 			break;
 
+		case WD_EVENT_I_AM_APPEARING_LOST:
+		{
+			/* The remote node has lost us, It would have already marked
+			 * us as lost, So remove it from standby*/
+			standby_node_left_cluster(wdNode);
+		}
+			break;
+
+		case WD_EVENT_I_AM_APPEARING_FOUND:
+		{
+			/* The remote node has found us again */
+			if (wdNode->wd_data_major_version >= 1 && wdNode->wd_data_minor_version >= 1)
+			{
+				/*
+				 * Since data version 1.1 we support CLUSTER_NODE_REQUIRE_TO_RELOAD
+				 * which makes the standby nodes to re-send the join master node
+				 */
+				ereport(DEBUG1,
+					(errmsg("asking remote node \"%s\" to rejoin master", wdNode->nodeName),
+						errdetail("watchdog data version %s",WD_MESSAGE_DATA_VERSION)));
+
+				send_cluster_service_message(wdNode, pkt, CLUSTER_NODE_REQUIRE_TO_RELOAD);
+			}
+			else
+			{
+				/*
+				 * The node is on older version
+				 * So ask it to re-join the cluster
+				 */
+				ereport(DEBUG1,
+					(errmsg("asking remote node \"%s\" to rejoin cluster", wdNode->nodeName),
+						errdetail("watchdog data version %s",WD_MESSAGE_DATA_VERSION)));
+				send_cluster_service_message(wdNode, pkt, CLUSTER_NEEDS_ELECTION);
+			}
+		}
+			break;
+
 		case WD_EVENT_REMOTE_NODE_LOST:
 			{
 				standby_node_left_cluster(wdNode);
@@ -5685,6 +5912,7 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac
 			ereport(LOG,
 				(errmsg("remote node \"%s\" is reachable again", wdNode->nodeName),
 					errdetail("trying to add it back as a standby")));
+			wdNode->node_lost_reason = NODE_LOST_UNKNOWN_REASON;
 			/* If I am the cluster master. Ask for the node info and to re-send the join message */
 			send_message_of_type(wdNode, WD_REQ_INFO_MESSAGE, NULL);
 			if (wdNode->wd_data_major_version >= 1 && wdNode->wd_data_minor_version >= 1)
@@ -5717,12 +5945,21 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac
 			{
 				switch (pkt->type)
 				{
+					case WD_ADD_NODE_MESSAGE:
+						/* In case we received the ADD node message from
+						 * one of our standby, Remove that standby from
+						 * the list
+						 */
+						standby_node_left_cluster(wdNode);
+						standard_packet_processor(wdNode, pkt);
+						break;
+
 					case WD_STAND_FOR_COORDINATOR_MESSAGE:
 						reply_with_minimal_message(wdNode, WD_REJECT_MESSAGE, pkt);
 						break;
 					case WD_DECLARE_COORDINATOR_MESSAGE:
 						ereport(NOTICE,
-								(errmsg("We are corrdinator and another node tried a coup")));
+								(errmsg("We are coordinator and another node tried a coup")));
 						reply_with_minimal_message(wdNode, WD_ERROR_MESSAGE, pkt);
 						break;
 
@@ -5757,14 +5994,24 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac
 
 					case WD_JOIN_COORDINATOR_MESSAGE:
 						{
-							reply_with_minimal_message(wdNode, WD_ACCEPT_MESSAGE, pkt);
-
 							/*
-							 * Also get the configurations from the standby
-							 * node
+							 * If the node is marked as lost because of
+							 * life-check, Do not let it join the cluster
 							 */
-							send_message_of_type(wdNode, WD_ASK_FOR_POOL_CONFIG, NULL);
-							standby_node_join_cluster(wdNode);
+							if (wdNode->state == WD_LOST && wdNode->node_lost_reason == NODE_LOST_BY_LIFECHECK)
+							{
+								ereport(LOG,
+										(errmsg("lost remote node \"%s\" is requesting to join the cluster",wdNode->nodeName),
+										 errdetail("rejecting the request until life-check inform us that it is reachable again")));
+								reply_with_minimal_message(wdNode, WD_REJECT_MESSAGE, pkt);
+							}
+							else
+							{
+								reply_with_minimal_message(wdNode, WD_ACCEPT_MESSAGE, pkt);
+								/* Also get the configurations from the standby node */
+								send_message_of_type(wdNode,WD_ASK_FOR_POOL_CONFIG,NULL);
+								standby_node_join_cluster(wdNode);
+							}
 						}
 						break;
 
@@ -5795,7 +6042,7 @@ watchdog_state_machine_coordinator(WD_EVENTS event, WatchdogNode * wdNode, WDPac
  * watchdog node when the network becomes reachable, but there is a problem.
  *
  * Once the cable on the system is unplugged or when the node gets isolated from the
- * cluster there is every likelihood that the backend healthcheck of the isolated node
+ * cluster there is every likelihood that the backend health-check of the isolated node
  * start reporting the backend node failure and the pgpool-II proceeds to perform
  * the failover for all attached backend nodes. Since the pgpool-II is yet not
  * smart enough to figure out it is because of the network failure of its own
@@ -5872,6 +6119,42 @@ watchdog_state_machine_nw_error(WD_EVENTS event, WatchdogNode * wdNode, WDPacket
 	return 0;
 }
 
+/*
+ * we could end up in tis state if we were connected to the
+ * master node as standby and got lost on the master.
+ * Here we just wait for BEACON_MESSAGE_INTERVAL_SECONDS
+ * and retry to join the cluster.
+ */
+static int
+watchdog_state_machine_nw_isolation(WD_EVENTS event, WatchdogNode * wdNode, WDPacketData * pkt, WDCommandData * clusterCommand)
+{
+	switch (event)
+	{
+		case WD_EVENT_WD_STATE_CHANGED:
+			set_timeout(BEACON_MESSAGE_INTERVAL_SECONDS);
+			break;
+
+		case WD_EVENT_PACKET_RCV:
+			standard_packet_processor(wdNode, pkt);
+			break;
+
+		case WD_EVENT_REMOTE_NODE_FOUND:
+		case WD_EVENT_WD_STATE_REQUIRE_RELOAD:
+		case WD_EVENT_I_AM_APPEARING_FOUND:
+		case WD_EVENT_TIMEOUT:
+			/* fall through */
+		case WD_EVENT_NW_IP_IS_ASSIGNED:
+			ereport(LOG,
+				(errmsg("trying again to join the cluster")));
+			set_state(WD_JOINING);
+			break;
+
+		default:
+			break;
+	}
+	return 0;
+}
+
 static bool
 beacon_message_received_from_node(WatchdogNode * wdNode, WDPacketData * pkt)
 {
@@ -6044,7 +6327,7 @@ handle_split_brain(WatchdogNode * otherMasterNode, WDPacketData * pkt)
 				(errmsg("We are in split brain, and \"%s\" node is the best candidate for master/coordinator"
 						,otherMasterNode->nodeName),
 				 errdetail("re-initializing the local watchdog cluster state")));
-		/* brodcast the message about I am not the true master node */
+		/* broadcast the message about I am not the true master node */
 		send_cluster_service_message(NULL, pkt, CLUSTER_IAM_NOT_TRUE_MASTER);
 		set_state(WD_JOINING);
 	}
@@ -6070,8 +6353,8 @@ start_escalated_node(void)
 	while (g_cluster.de_escalation_pid > 0 && wait_secs-- > 0)
 	{
 		/*
-		 * de_escalation proceess was already running and we are esclating
-		 * again. give some time to de-escalation process to exit normaly
+		 * de_escalation process was already running and we are escalating
+		 * again. give some time to de-escalation process to exit normally
 		 */
 		ereport(LOG,
 				(errmsg("waiting for de-escalation process to exit before starting escalation")));
@@ -6112,8 +6395,8 @@ resign_from_escalated_node(void)
 	while (g_cluster.escalation_pid > 0 && wait_secs-- > 0)
 	{
 		/*
-		 * escalation proceess was already running and we are resigning from
-		 * it. wait for the escalation process to exit normaly
+		 * escalation process was already running and we are resigning from
+		 * it. wait for the escalation process to exit normally
 		 */
 		ereport(LOG,
 				(errmsg("waiting for escalation process to exit before starting de-escalation")));
@@ -6209,10 +6492,11 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
 			send_cluster_command(WD_MASTER_NODE, WD_JOIN_COORDINATOR_MESSAGE, 5);
 			/* Also reset my priority as per the original configuration */
 			g_cluster.localNode->wd_priority = pool_config->wd_priority;
+			set_timeout(BEACON_MESSAGE_INTERVAL_SECONDS);
 			break;
 
 		case WD_EVENT_TIMEOUT:
-			set_timeout(5);
+			set_timeout(BEACON_MESSAGE_INTERVAL_SECONDS);
 			break;
 
 		case WD_EVENT_WD_STATE_REQUIRE_RELOAD:
@@ -6224,27 +6508,54 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
 			break;
 
 		case WD_EVENT_COMMAND_FINISHED:
+		{
+			if (clusterCommand->commandPacket.type == WD_JOIN_COORDINATOR_MESSAGE)
 			{
-				if (clusterCommand->commandPacket.type == WD_JOIN_COORDINATOR_MESSAGE)
+				if (clusterCommand->commandStatus == COMMAND_FINISHED_ALL_REPLIED ||
+					clusterCommand->commandStatus == COMMAND_FINISHED_TIMEOUT)
 				{
-					if (clusterCommand->commandStatus == COMMAND_FINISHED_ALL_REPLIED ||
-						clusterCommand->commandStatus == COMMAND_FINISHED_TIMEOUT)
-					{
-						register_watchdog_state_change_interupt();
+					register_watchdog_state_change_interupt();
+
+					ereport(LOG,
+							(errmsg("successfully joined the watchdog cluster as standby node"),
+							 errdetail("our join coordinator request is accepted by cluster leader node \"%s\"", WD_MASTER_NODE->nodeName)));
+				}
+				else
+				{
+					ereport(NOTICE,
+							(errmsg("our join coordinator is rejected by node \"%s\"", wdNode->nodeName),
+							 errhint("rejoining the cluster.")));
 
+					if (WD_MASTER_NODE->has_lost_us)
+					{
 						ereport(LOG,
-								(errmsg("successfully joined the watchdog cluster as standby node"),
-								 errdetail("our join coordinator request is accepted by cluster leader node \"%s\"", WD_MASTER_NODE->nodeName)));
+								(errmsg("master node \"%s\" thinks we are lost, and \"%s\" is not letting us join",WD_MASTER_NODE->nodeName,wdNode->nodeName),
+								 errhint("please verify the watchdog life-check and network is working properly")));
+						set_state(WD_NETWORK_ISOLATION);
 					}
 					else
 					{
-						ereport(NOTICE,
-								(errmsg("our join coordinator is rejected by node \"%s\"", wdNode->nodeName),
-								 errhint("rejoining the cluster.")));
 						set_state(WD_JOINING);
 					}
 				}
 			}
+		}
+			break;
+
+		case WD_EVENT_I_AM_APPEARING_LOST:
+		{
+			/* The remote node has lost us, and if it
+			 * was our coordinator we might already be
+			 * removed from it's standby list
+			 * So re-Join the cluster
+			 */
+			if (WD_MASTER_NODE == wdNode)
+			{
+				ereport(LOG,
+						(errmsg("we are lost on the master node \"%s\"",wdNode->nodeName)));
+				set_state(WD_JOINING);
+			}
+		}
 			break;
 
 		case WD_EVENT_REMOTE_NODE_LOST:
@@ -6266,6 +6577,22 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
 			{
 				switch (pkt->type)
 				{
+					case WD_ADD_NODE_MESSAGE:
+					{
+						/* In case we received the ADD node message from
+						 * our coordinator. Reset the cluster state
+						 */
+						if (wdNode == WD_MASTER_NODE)
+						{
+							ereport(LOG,
+									(errmsg("received ADD NODE message from the master node \"%s\"", wdNode->nodeName),
+									 errdetail("re-joining the cluster")));
+							set_state(WD_JOINING);
+						}
+						standard_packet_processor(wdNode, pkt);
+					}
+						break;
+
 					case WD_FAILOVER_END:
 						{
 							register_backend_state_sync_req_interupt();
@@ -6281,8 +6608,11 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
 							}
 							else
 							{
+								ereport(LOG,
+										(errmsg("We are connected to master node \"%s\" and another node \"%s\" is trying to become a master",WD_MASTER_NODE->nodeName, wdNode->nodeName)));
 								reply_with_minimal_message(wdNode, WD_ERROR_MESSAGE, pkt);
-								set_state(WD_JOINING);
+								/* Ask master to re-send its node info */
+								send_message_of_type(WD_MASTER_NODE, WD_REQ_INFO_MESSAGE, NULL);
 							}
 						}
 						break;
@@ -6293,14 +6623,14 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
 							{
 								/*
 								 * we already have a master node and we got a
-								 * new node trying to be master re-initialize
-								 * the cluster, something is wrong
+								 * new node trying to be master
 								 */
+								ereport(LOG,
+										(errmsg("We are connected to master node \"%s\" and another node \"%s\" is trying to declare itself as a master",WD_MASTER_NODE->nodeName, wdNode->nodeName)));
 								reply_with_minimal_message(wdNode, WD_ERROR_MESSAGE, pkt);
-							}
-							else
-							{
-								set_state(WD_JOINING);
+								/* Ask master to re-send its node info */
+								send_message_of_type(WD_MASTER_NODE, WD_REQ_INFO_MESSAGE, NULL);
+
 							}
 						}
 						break;
@@ -6320,7 +6650,7 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
 
 								send_cluster_service_message(NULL, pkt, CLUSTER_IN_SPLIT_BRAIN);
 							}
-							else
+							else if (check_debug_request_do_not_reply_beacon() == false)
 							{
 								send_message_of_type(wdNode, WD_INFO_MESSAGE, pkt);
 								beacon_message_received_from_node(wdNode, pkt);
@@ -6350,7 +6680,7 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
 		gettimeofday(&currTime, NULL);
 		int			last_rcv_sec = WD_TIME_DIFF_SEC(currTime, WD_MASTER_NODE->last_rcv_time);
 
-		if (last_rcv_sec >= (2 * BEACON_MESSAGE_INTERVAL_SECONDS))
+		if (last_rcv_sec >= (3 * BEACON_MESSAGE_INTERVAL_SECONDS))
 		{
 			/* we have missed atleast two beacons from master node */
 			ereport(WARNING,
@@ -6358,9 +6688,8 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
 							WD_MASTER_NODE->nodeName),
 					 errdetail("re-initializing the cluster")));
 			set_state(WD_JOINING);
-
 		}
-		else if (last_rcv_sec >= BEACON_MESSAGE_INTERVAL_SECONDS)
+		else if (last_rcv_sec >= (2 * BEACON_MESSAGE_INTERVAL_SECONDS))
 		{
 			/*
 			 * We have not received a last becacon from master ask for the
@@ -6381,10 +6710,10 @@ watchdog_state_machine_standby(WD_EVENTS event, WatchdogNode * wdNode, WDPacketD
  * The function identifies the current quorum state
  * quorum values:
  * -1:
- *     quorum is lost or does not exisits
+ *     quorum is lost or does not exists
  * 0:
  *     The quorum is on the edge. (when participating cluster is configured
- *     with even number of nodes, and we have exectly 50% nodes
+ *     with even number of nodes, and we have exactly 50% nodes
  * 1:
  *     quorum exists
  */
@@ -6393,11 +6722,11 @@ update_quorum_status(void)
 {
 	int			quorum_status = g_cluster.quorum_status;
 
-	if (g_cluster.clusterMasterInfo.standby_nodes_count > get_mimimum_remote_nodes_required_for_quorum())
+	if (g_cluster.clusterMasterInfo.standby_nodes_count > get_minimum_remote_nodes_required_for_quorum())
 	{
 		g_cluster.quorum_status = 1;
 	}
-	else if (g_cluster.clusterMasterInfo.standby_nodes_count == get_mimimum_remote_nodes_required_for_quorum())
+	else if (g_cluster.clusterMasterInfo.standby_nodes_count == get_minimum_remote_nodes_required_for_quorum())
 	{
 		if (g_cluster.remoteNodeCount % 2 != 0)
 		{
@@ -6424,10 +6753,10 @@ update_quorum_status(void)
  * returns the minimum number of remote nodes required for quorum
  */
 static int
-get_mimimum_remote_nodes_required_for_quorum(void)
+get_minimum_remote_nodes_required_for_quorum(void)
 {
 	/*
-	 * Even numner of remote nodes, That means total number of nodes are odd,
+	 * Even number of remote nodes, That means total number of nodes are odd,
 	 * so minimum quorum is just remote/2.
 	 */
 	if (g_cluster.remoteNodeCount % 2 == 0)
@@ -6447,11 +6776,11 @@ static int
 get_minimum_votes_to_resolve_consensus(void)
 {
 	/*
-	 * Since get_mimimum_remote_nodes_required_for_quorum() returns
+	 * Since get_minimum_remote_nodes_required_for_quorum() returns
 	 * the number of remote nodes required to complete the quorum
 	 * that is always one less than the total number of nodes required
 	 * for the cluster to build quorum or consensus, reason being
-	 * in get_mimimum_remote_nodes_required_for_quorum()
+	 * in get_minimum_remote_nodes_required_for_quorum()
 	 * we always consider the local node as a valid pre-casted vote.
 	 * But when it comes to count the number of votes required to build
 	 * consensus for any type of decision, for example for building the
@@ -6463,8 +6792,8 @@ get_minimum_votes_to_resolve_consensus(void)
 	 * For example
 	 * If Total nodes in cluster = 4
 	 * 		remote node will be = 3
-	 * 		get_mimimum_remote_nodes_required_for_quorum() return = 1
-	 *		Minimum number of votes required for consensu will be
+	 * 		get_minimum_remote_nodes_required_for_quorum() return = 1
+	 *		Minimum number of votes required for consensus will be
 	 *
 	 *		if(pool_config->enable_consensus_with_half_votes = true)
 	 *			(exact 50% n/2) ==> 4/2 = 2
@@ -6474,13 +6803,13 @@ get_minimum_votes_to_resolve_consensus(void)
 	 *
 	 */
 
-	int required_node_count = get_mimimum_remote_nodes_required_for_quorum()  + 1;
+	int required_node_count = get_minimum_remote_nodes_required_for_quorum()  + 1;
 	/*
 	 * When the total number of nodes in the watchdog cluster including the
 	 * local node are even, The number of votes required for the consensus
 	 * depends on the enable_consensus_with_half_votes.
 	 * So for even number of nodes when enable_consensus_with_half_votes is
-	 * not allowed than we would nedd one more vote than exact 50%
+	 * not allowed than we would add one more vote than exact 50%
 	 */
 	if (g_cluster.remoteNodeCount % 2 != 0)
 	{
@@ -6493,7 +6822,7 @@ get_minimum_votes_to_resolve_consensus(void)
 
 /*
  * sets the state of local watchdog node, and fires a state change event
- * if the new and old state differes
+ * if the new and old state differs
  */
 static int
 set_state(WD_STATES newState)
@@ -6952,7 +7281,7 @@ check_IPC_client_authentication(json_value * rootObj, bool internal_client_only)
 	if (json_get_int_value_for_key(rootObj, WD_IPC_SHARED_KEY, (int *) &packet_key))
 	{
 		ereport(DEBUG2,
-				(errmsg("IPC json data packet does not contain shared key")));
+				(errmsg("IPC JSON data packet does not contain shared key")));
 		has_shared_key = false;
 	}
 	else
@@ -6973,15 +7302,15 @@ check_IPC_client_authentication(json_value * rootObj, bool internal_client_only)
 		if (has_shared_key == false)
 		{
 			ereport(LOG,
-					(errmsg("invalid json data packet"),
-					 errdetail("authentication shared key not found in json data")));
+					(errmsg("invalid JSON data packet"),
+					 errdetail("authentication shared key not found in JSON data")));
 			return false;
 		}
 		/* compare if shared keys match */
 		if (*shared_key != packet_key)
 			return false;
 
-		/* providing a valid shared key for inetenal clients is enough */
+		/* providing a valid shared key for internal clients is enough */
 		return true;
 	}
 
@@ -6993,14 +7322,14 @@ check_IPC_client_authentication(json_value * rootObj, bool internal_client_only)
 	if (has_shared_key == true && *shared_key == packet_key)
 		return true;
 
-	/* shared key is out of question validate the authKey valurs */
+	/* shared key is out of question validate the authKey values */
 	packet_auth_key = json_get_string_value_for_key(rootObj, WD_IPC_AUTH_KEY);
 
 	if (packet_auth_key == NULL)
 	{
 		ereport(DEBUG1,
-				(errmsg("invalid json data packet"),
-				 errdetail("authentication key not found in json data")));
+				(errmsg("invalid JSON data packet"),
+				 errdetail("authentication key not found in JSON data")));
 		return false;
 	}
 
@@ -7244,7 +7573,7 @@ set_cluster_master_node(WatchdogNode * wdNode)
 	{
 		if (wdNode == NULL)
 			ereport(LOG,
-					(errmsg("unassigning the %s node \"%s\" from watchdog cluster master",
+					(errmsg("removing the %s node \"%s\" from watchdog cluster master",
 							(g_cluster.localNode == WD_MASTER_NODE) ? "local" : "remote",
 							WD_MASTER_NODE->nodeName)));
 		else
@@ -7344,3 +7673,202 @@ clear_standby_nodes_list(void)
 	g_cluster.clusterMasterInfo.standby_nodes_count = 0;
 	g_cluster.localNode->standby_nodes_count = 0;
 }
+
+static void update_missed_beacon_count(WDCommandData* ipcCommand, bool clear)
+{
+	int i;
+	for (i=0; i< g_cluster.remoteNodeCount; i++)
+	{
+		if (clear)
+		{
+			WatchdogNode* wdNode = &(g_cluster.remoteNodes[i]);
+			wdNode->missed_beacon_count = 0;
+		}
+		else
+		{
+			WDCommandNodeResult* nodeResult = &ipcCommand->nodeResults[i];
+			if (ipcCommand->commandStatus == COMMAND_IN_PROGRESS )
+				return;
+
+			if (nodeResult->cmdState == COMMAND_STATE_SENT)
+			{
+				if (nodeResult->wdNode->state == WD_STANDBY)
+				{
+					nodeResult->wdNode->missed_beacon_count++;
+					if (nodeResult->wdNode->missed_beacon_count > 1)
+						ereport(LOG,
+							(errmsg("remote node \"%s\" is not replying to our beacons",nodeResult->wdNode->nodeName),
+							 errdetail("missed beacon reply count:%d",nodeResult->wdNode->missed_beacon_count)));
+				}
+				else
+					nodeResult->wdNode->missed_beacon_count = 0;
+			}
+			if (nodeResult->cmdState == COMMAND_STATE_REPLIED)
+			{
+				if (nodeResult->wdNode->missed_beacon_count > 0)
+					ereport(LOG,
+							(errmsg("remote node \"%s\" is replying again after missing %d beacons",nodeResult->wdNode->nodeName,
+									nodeResult->wdNode->missed_beacon_count)));
+				nodeResult->wdNode->missed_beacon_count = 0;
+			}
+		}
+	}
+}
+
+#ifdef WATCHDOG_DEBUG
+/*
+ * Node down request file. In the file, each line consists of watchdog
+ * debug command.  The possible commands are same as the defines below
+ * for example to stop Pgpool-II from sending the reply to beacon messages
+ * from the master node write DO_NOT_REPLY_TO_BEACON in watchdog_debug_requests
+ *
+ *
+ * echo "DO_NOT_REPLY_TO_BEACON" > pgpool_logdir/watchdog_debug_requests
+ */
+
+typedef struct watchdog_debug_commands
+{
+	char		command[100];
+	unsigned int code;
+}			watchdog_debug_commands;
+
+unsigned int watchdog_debug_command = 0;
+
+
+#define WATCHDOG_DEBUG_FILE	"watchdog_debug_requests"
+
+#define DO_NOT_REPLY_TO_BEACON 	1
+#define DO_NOT_SEND_BEACON 		2
+#define KILL_ALL_COMMUNICATION	4
+#define KILL_ALL_RECEIVERS		8
+#define KILL_ALL_SENDERS		16
+
+
+watchdog_debug_commands wd_debug_commands[] = {
+	{"DO_NOT_REPLY_TO_BEACON", DO_NOT_REPLY_TO_BEACON},
+	{"DO_NOT_SEND_BEACON",     DO_NOT_SEND_BEACON},
+	{"KILL_ALL_COMMUNICATION", KILL_ALL_COMMUNICATION},
+	{"KILL_ALL_RECEIVERS",     KILL_ALL_RECEIVERS},
+	{"KILL_ALL_SENDERS",       KILL_ALL_SENDERS},
+	{"", 0}
+};
+
+static bool
+check_debug_request_kill_all_communication(void)
+{
+	return (watchdog_debug_command & KILL_ALL_COMMUNICATION);
+}
+static bool
+check_debug_request_kill_all_receivers(void)
+{
+	return (watchdog_debug_command & KILL_ALL_RECEIVERS);
+}
+static bool
+check_debug_request_kill_all_senders(void)
+{
+	return (watchdog_debug_command & KILL_ALL_SENDERS);
+}
+
+static bool
+check_debug_request_do_not_send_beacon(void)
+{
+	return (watchdog_debug_command & DO_NOT_SEND_BEACON);
+}
+
+static bool
+check_debug_request_do_not_reply_beacon(void)
+{
+	return (watchdog_debug_command & DO_NOT_REPLY_TO_BEACON);
+}
+/*
+ * Check watchdog debug request options file for debug commands
+ * each line should contain only one command
+ *
+ * Possible commands
+ * 		DO_NOT_REPLY_TO_BEACON
+ *		DO_NOT_SEND_BEACON
+ *		KILL_ALL_COMMUNICATION
+ *		KILL_ALL_RECEIVERS
+ *		KILL_ALL_SENDERS
+ */
+
+static void
+load_watchdog_debug_test_option(void)
+{
+	static char wd_debug_request_file[POOLMAXPATHLEN];
+	FILE	   *fd;
+	int			i;
+#define MAXLINE 128
+	char		readbuf[MAXLINE];
+
+	watchdog_debug_command = 0;
+
+	if (wd_debug_request_file[0] == '\0')
+	{
+		snprintf(wd_debug_request_file, sizeof(wd_debug_request_file),
+				 "%s/%s", pool_config->logdir, WATCHDOG_DEBUG_FILE);
+	}
+
+	fd = fopen(wd_debug_request_file, "r");
+	if (!fd)
+	{
+		ereport(DEBUG3,
+				(errmsg("load_watchdog_debug_test_option: failed to open file %s",
+						wd_debug_request_file),
+				 errdetail("\"%s\"", strerror(errno))));
+		return;
+	}
+
+	for (i = 0;; i++)
+	{
+		int cmd = 0;
+		bool valid_command = false;
+		readbuf[MAXLINE - 1] = '\0';
+		if (fgets(readbuf, MAXLINE - 1, fd) == 0)
+			break;
+		for (cmd =0 ;; cmd++)
+		{
+			if (strlen(wd_debug_commands[cmd].command) == 0 || wd_debug_commands[cmd].code == 0)
+				break;
+
+			if (strncasecmp(wd_debug_commands[cmd].command,readbuf,strlen(wd_debug_commands[cmd].command)) == 0)
+			{
+				ereport(DEBUG3,
+						(errmsg("Watchdog DEBUG COMMAND %d: \"%s\" request found",
+								cmd,wd_debug_commands[cmd].command)));
+
+				watchdog_debug_command |= wd_debug_commands[cmd].code;
+				valid_command = true;
+				break;
+			}
+		}
+		if (!valid_command)
+			ereport(WARNING,
+				(errmsg("%s file contains invalid command",
+							wd_debug_request_file),
+					 errdetail("\"%s\" not recognized", readbuf)));
+	}
+
+	fclose(fd);
+}
+#else
+/*
+ * All these command checks return false when WATCHDOG_DEBUG is
+ * not enabled
+ */
+static bool
+check_debug_request_do_not_send_beacon(void)
+{return false;}
+static bool
+check_debug_request_do_not_reply_beacon(void)
+{return false;}
+static bool
+check_debug_request_kill_all_communication(void)
+{return false;}
+static bool
+check_debug_request_kill_all_receivers(void)
+{return false;}
+static bool
+check_debug_request_kill_all_senders(void)
+{return false;}
+#endif

t-ishii

2019-10-02 23:28

developer   ~0002900

Usama,
After applying the patch, one of watchdog regression test failed.
testing 011.watchdog_quorum_failover...failed.

regression log attached.

011.watchdog_quorum_failover (8,474 bytes)

t-ishii

2019-10-02 23:32

developer   ~0002902

Also pgpool.log attached.

pgpool-logs.tar.gz (5,493 bytes)

Muhammad Usama

2019-10-03 00:19

developer   ~0002903

Hi Ishii-San,

I am looking into attached logs

Somehow regression always passes on my machine :-)

testing 011.watchdog_quorum_failover...ok.
testing 012.watchdog_failover_when_quorum_exists...ok.
testing 013.watchdog_failover_require_consensus...ok.
testing 014.watchdog_test_quorum_bypass...ok.
testing 015.watchdog_master_and_backend_fail...ok.
testing 016.node_0_is_not_primary...ok.
testing 017.node_0_is_down...ok.
testing 018.detach_primary...ok.
testing 019.log_client_messages...ok.
testing 020.allow_clear_text_frontend_auth...ok.
testing 021.pool_passwd_auth...ok.
testing 022.pool_passwd_alternative_auth...ok.
testing 023.ssl_connection...ok.
testing 024.cert_auth...ok.

Muhammad Usama

2019-10-03 00:33

developer   ~0002904

From the look of it seems like the setup for PostgreSQL server is failing the test case.

See the line 23 in attached 011.watchdog_quorum_failover

 23 recovery node 1...ERROR: executing recovery, execution of command failed at "1st stage"
 24 DETAIL: command:"basebackup.sh"

But I can't figure out what could be the reason for that.

t-ishii

2019-10-03 07:13

developer   ~0002905

Oops. You are right. I had broken my pgpool set up while migrating to PostgreSQL 12. Sorry for noise.

harukat

2019-10-03 13:33

developer   ~0002906

Thanks, Usama.
I'll do a test the patch in the V4_1_STABLE environment.

It looks big changes.
Shouldn't I except 3.7.x version fix?

Muhammad Usama

2019-10-03 18:19

developer   ~0002907

Once you verify the fix I will backport it to all supported branches including 3.7

Thanks

harukat

2019-10-08 13:31

developer   ~0002912

Last edited: 2019-10-08 13:32

View 2 revisions

I did a test that runs Pgpool 4.1.x nodes in the artificial unstable network by executing the attached script.
This "make_splitbrain.sh" script blocks the watchdog communication randomly when it is a master node.
In spite of the severe network environment, patched Pgpools run well long time (over 1 day) in most cases.
They often recovered from split-brain status successfully.

I also got a failed case. It's log is attached "test_20191007.tgz" file.
I'm sorry, it may be difficult to read because it includes Japanese locale output.
In that case, I need to execute arping to use Pgpool-II via VIP.
Its log say at last:
  Oct 7 16:19:00 cp2 [16456]: [2889-1] 2019-10-07 16:19:00: pid 16456: LOG: successfully acquired the delegate IP:"10.10.10.152"
  Oct 7 16:19:00 cp2 [16456]: [2889-2] 2019-10-07 16:19:00: pid 16456: DETAIL: 'if_up_cmd' returned with success
But I cannot access Pgpool-II in cp2 host via VIP without arping. I couldn't find its cause.



make_splitbrain.sh (999 bytes)
test_20191007.tgz (113,231 bytes)

Muhammad Usama

2019-10-11 00:19

developer   ~0002918

Hi Harukat,

First of all many thanks for doing thorough testing.
I have gone through all the attached logs and it seems that watchdog is behaving correctly. despite some
flooding of log messages, I can't see any issue currently at least in the logs.

As far as the problem you described, that you had to do arping on cp2 after the escalation that happened at time:'Oct 7 16:19:00'.

I believe it's not because of anything wrong done by Pgpool-II or watchdog ( at least noting in the logs suggests
anything wrong with watchdog or pgpool that could have caused that)

If you see the logs of all three nodes around that time you could see that only one node CP2 had performed
the escalation and brought up the VIP and no other node had acquired the VIP after that time.
And since after acquiring the VIP pgpool escalation process does performs the arping and ping, so I am guessing some
external factor might be involved. Because nothing in the logs points to any situation or issue that could require a manual arping.

I am not sure but I am thinking if that can be caused by a nature of the test case as it was causing the very frequent escalations and de-escalations,
So do you think if it is possible that a network switch or VM host might have played role in that?
For example, the last de-escalation on CP2 happened at time:"7 16:18:45" (VIP released by CP2) and the new escalation started
at time:"7 16:18:56" ( VIP acquired again), So its just 10 seconds gap in between the release and acquiring of VIP on CP2.
So I am just thinking out loud that what if ARP tables on client machine might have got it wrong because of these frequent updates.
Somehow the client machine still didn't receive the new VIP record.
Though I still think that might not be a case and there is some other external elements that would have caused that.

harukat

2019-10-15 11:54

developer   ~0002927

Thank you for confirming my report.
If you have no idea on the Pgpool-II code for the failed case,
I also think some external elements caused it because the log suggests the code must do arping last.
And my test script could start dropping the packets in the VIP holder host immediately
after its escalation without a stable time. It can be said a kind of double fault, so I think
it's OK that the Pgpool-II patched code don't cover the such case.

Issue History

Date Modified Username Field Change
2019-09-12 16:37 harukat New Issue
2019-09-12 16:52 t-ishii Assigned To => Muhammad Usama
2019-09-12 16:52 t-ishii Status new => assigned
2019-09-13 15:57 t-ishii Note Added: 0002845
2019-09-13 17:07 t-ishii Note Edited: 0002845 View Revisions
2019-09-13 21:57 Muhammad Usama Note Added: 0002847
2019-09-13 21:58 Muhammad Usama Note Edited: 0002847 View Revisions
2019-09-17 11:10 harukat File Added: lost_arping_case.log
2019-09-17 11:10 harukat Note Added: 0002855
2019-09-25 13:03 harukat Note Added: 0002879
2019-09-27 12:19 harukat File Added: pgpool2_V3_7_STSBLE_arping_again.patch
2019-09-27 12:19 harukat Note Added: 0002885
2019-09-27 12:32 t-ishii Note Added: 0002886
2019-09-27 13:14 t-ishii Note Added: 0002887
2019-10-01 18:10 harukat Note Added: 0002896
2019-10-02 09:36 t-ishii Note Added: 0002898
2019-10-02 23:07 Muhammad Usama File Added: watchdog_node_lost_fix.diff
2019-10-02 23:07 Muhammad Usama Note Added: 0002899
2019-10-02 23:28 t-ishii File Added: 011.watchdog_quorum_failover
2019-10-02 23:28 t-ishii Note Added: 0002900
2019-10-02 23:32 t-ishii File Added: pgpool-logs.tar.gz
2019-10-02 23:32 t-ishii Note Added: 0002902
2019-10-03 00:19 Muhammad Usama Note Added: 0002903
2019-10-03 00:33 Muhammad Usama Note Added: 0002904
2019-10-03 07:13 t-ishii Note Added: 0002905
2019-10-03 13:33 harukat Note Added: 0002906
2019-10-03 18:19 Muhammad Usama Note Added: 0002907
2019-10-08 13:31 harukat File Added: make_splitbrain.sh
2019-10-08 13:31 harukat File Added: test_20191007.tgz
2019-10-08 13:31 harukat Note Added: 0002912
2019-10-08 13:32 harukat Note Edited: 0002912 View Revisions
2019-10-11 00:19 Muhammad Usama Note Added: 0002918
2019-10-15 11:54 harukat Note Added: 0002927
2019-10-31 18:41 administrator Fixed in Version => 3.7.12
2019-10-31 18:41 administrator Target Version => 3.7.12