0000228: pgpool doesnt de-escalate IP in case netowkr restored - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000228	Pgpool-II	Bug	public	2016-08-02 01:53	2016-08-04 23:51

Reporter	supp_k	Assigned To	Muhammad Usama
Priority	high	Severity	major	Reproducibility	always
Status	resolved	Resolution	fixed
Platform	pgpool	OS	CentOS	OS Version	6 & 7
Product Version	3.5.3

Summary	0000228: pgpool doesnt de-escalate IP in case netowkr restored
Description	Pgpool doesn't de-escalate IP address in case the split brain is resolved when it turns from Master into Standby.
Steps To Reproduce	Environment: 1) Pgpool A (Master) hosts VIP (virtual IP) 2) Pgpool B (Standby) Watchdog and heartbit processes are OK. Steps to reproduce: - emulate network failure between Pgpool A & Pgpool B for the heartbit receive time period. When the heartbit time is exceeded Pgpool initiates voting and up the VIP - it is Ok! Now we have 2 Pgpool masters within the network => it is the "split brain" case. - restore network connectivity between Pgpool A & Pgpool B => pgpools restart voting and one of the masters turns into Standby (let it be the Pgpool B) - it is OK as well but at the same moment the Pgpool B doesnt down (ip addr del ...) the VIP. Should it be?
Tags	watchdog

Muhammad Usama 2016-08-03 23:14 developer	de-esc_bug_228.diff (643 bytes) de-esc_bug_228.diff (643 bytes)

Muhammad Usama 2016-08-03 23:16 developer ~0000963	Hi I was able to reproduce the issue, Can you please try the attached patch "de-esc_bug_228.diff" if it solves your problem

supp_k 2016-08-04 01:41 reporter ~0000964	Hi, yes the problem disappeared. Here are the log records: 2016-08-03 19:37:34: pid 2604: WARNING: "Linux_warm1.local_9999" is the coordinator as per our record but "Linux_warm0.local_9999" is also announcing as a coordinator 2016-08-03 19:37:34: pid 2604: DETAIL: re-initializing the cluster 2016-08-03 19:37:34: pid 2604: LOG: watchdog node state changed from [MASTER] to [JOINING] 2016-08-03 19:37:34: pid 2952: LOG: watchdog: de-escalation started 2016-08-03 19:37:34: pid 2604: WARNING: the coordinator as per our record is not coordinator anymore 2016-08-03 19:37:34: pid 2604: DETAIL: re-initializing the cluster 2016-08-03 19:37:34: pid 2604: LOG: watchdog node state changed from [JOINING] to [INITIALIZING] 2016-08-03 19:37:35: pid 2604: LOG: watchdog node state changed from [INITIALIZING] to [STANDING FOR MASTER] 2016-08-03 19:37:35: pid 2604: LOG: watchdog node state changed from [STANDING FOR MASTER] to [PARTICIPATING IN ELECTION] 2016-08-03 19:37:35: pid 2604: LOG: watchdog node state changed from [PARTICIPATING IN ELECTION] to [INITIALIZING] 2016-08-03 19:37:35: pid 2605: LOG: informing the node status change to watchdog 2016-08-03 19:37:35: pid 2605: DETAIL: node id :1 status = "NODE ALIVE" message:"Heartbeat signal found" 2016-08-03 19:37:35: pid 2604: LOG: new IPC connection received 2016-08-03 19:37:35: pid 2604: LOG: received node status change ipc message 2016-08-03 19:37:35: pid 2604: DETAIL: Heartbeat signal found 2016-08-03 19:37:36: pid 2604: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY] 2016-08-03 19:37:40: pid 2604: LOG: successfully joined the watchdog cluster as standby node 2016-08-03 19:37:40: pid 2604: DETAIL: our join coordinator request is accepted by cluster leader node "Linux_warm0.local_9999" 2016-08-03 19:37:46: pid 2952: WARNING: watchdog failed to ping host"192.168.7.7" 2016-08-03 19:37:46: pid 2952: DETAIL: ping process exits with code: 1 2016-08-03 19:37:46: pid 2952: LOG: watchdog bringing down delegate IP 2016-08-03 19:37:46: pid 2952: DETAIL: if_down_cmd succeeded 2016-08-03 19:37:46: pid 2604: LOG: watchdog de-escalation process with pid: 2952 exit with SUCCESS. Thank you!

Muhammad Usama 2016-08-04 23:51 developer ~0000965	Thanks for the confirmation of fix. I have committed the same in master and 3.5 branches http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=cf57d9970f46a92c52315b42eae9dbee73c90525

Date Modified	Username	Field	Change
2016-08-02 01:53	supp_k	New Issue
2016-08-02 10:22	t-ishii	Assigned To	=> Muhammad Usama
2016-08-02 10:22	t-ishii	Status	new => assigned
2016-08-02 13:44	t-ishii	Tag Attached: watchdog
2016-08-03 23:14	Muhammad Usama	File Added: de-esc_bug_228.diff
2016-08-03 23:16	Muhammad Usama	Status	assigned => feedback
2016-08-03 23:16	Muhammad Usama	Note Added: 0000963
2016-08-04 01:41	supp_k	Note Added: 0000964
2016-08-04 01:41	supp_k	Status	feedback => assigned
2016-08-04 23:51	Muhammad Usama	Status	assigned => resolved
2016-08-04 23:51	Muhammad Usama	Resolution	open => fixed
2016-08-04 23:51	Muhammad Usama	Note Added: 0000965