View Issue Details

IDProjectCategoryView StatusLast Update
0000125Pgpool-IIBugpublic2015-01-08 17:53
ReporterqianAssigned Tonagata 
PrioritynormalSeveritymajorReproducibilityalways
Status assignedResolutionopen 
Platformx86OSCentOSOS Version6.5
Product Version 
Target VersionFixed in Version 
Summary0000125: stand for master problem
Description3 nodes, each node runs pgpool and postgresql in streaming master-slave replication.
Steps To ReproduceAssumes s_1 master node, s_2 candidate node.

When s_1 fails, s_2 and s_3 by heartbeat mechanism to detect the state of s_1.

When s_2 finds s_1 go down, the master election is triggered, s_2 finds itself is the oldest, it sends a " STAND_FOR_MASTER" message to s_3, s_3 will check whether existence of the master.

Because s_3 temporarily doesn't detect s_1 down and think master is there, so s_3 will vote against to s_2.

When s_3 detect s_1 down ,s_3 doesn't do anything, because s_3 thinks s_2 is older.
Additional InformationThrough the s_3 log, I notice wd_is_contactable_master() is triggered all the time when failover is faster than the master election process. so we should let it sleep a while.

I have upload the patch, I hope it can help someone.
TagsNo tags attached.

Activities

qian

2014-12-24 12:24

reporter  

master_check.patch (1,320 bytes)
 src/watchdog/wd_interlock.c |  2 ++
 src/watchdog/wd_list.c      | 12 +++++++++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/watchdog/wd_interlock.c b/src/watchdog/wd_interlock.c
index a8c8b77..57be32b 100644
--- a/src/watchdog/wd_interlock.c
+++ b/src/watchdog/wd_interlock.c
@@ -227,6 +227,8 @@ wd_assume_lock_holder(void)
 	{
 		if (WD_MYSELF->status == WD_DOWN)
 			return WD_NG;
+            
+        sleep_in_waiting();
 	}
 
 	/* I'm master and not lock holder, or I succeeded to become lock holder */
diff --git a/src/watchdog/wd_list.c b/src/watchdog/wd_list.c
index 4d71ca4..41f142e 100644
--- a/src/watchdog/wd_list.c
+++ b/src/watchdog/wd_list.c
@@ -358,9 +358,15 @@ wd_is_alive_master(void)
 	master = wd_is_exist_master();
 	if (master != NULL)
 	{
-		if ((!strcmp(pool_config->wd_lifecheck_method, MODE_HEARTBEAT)) ||
-		    (!strcmp(pool_config->wd_lifecheck_method, MODE_QUERY)
-			     && wd_ping_pgpool(master) == WD_OK))
+        if (!strcmp(pool_config->wd_lifecheck_method, MODE_HEARTBEAT))
+        {
+            wd_update_info();
+            if (master->is_contactable) {
+                return master;
+            }
+        }
+		if (!strcmp(pool_config->wd_lifecheck_method, MODE_QUERY)
+			     && wd_ping_pgpool(master) == WD_OK)
 		{
 			return master;
 		}
master_check.patch (1,320 bytes)

nagata

2015-01-08 17:53

developer   ~0000510

Thanks. I'm looking into this.

Issue History

Date Modified Username Field Change
2014-12-24 12:24 qian New Issue
2014-12-24 12:24 qian File Added: master_check.patch
2015-01-08 17:36 nagata Assigned To => nagata
2015-01-08 17:36 nagata Status new => assigned
2015-01-08 17:53 nagata Note Added: 0000510