[pgpool-general: 4163] Heartbeat random fails between pgpool instances
Thomas SIMON
tsimon at neteven.com
Tue Nov 3 19:09:14 JST 2015
Hi all,
I have set up master/slave pgpool scheme in production last week, and I
have sometimes stranges behaviors of heartbeat check part.
After some time (can be days), master pgpool instance's heartbeat fails
with no reason, and triggers failover of delegate_IP on slave instance,
who becomes so master instance.
It says master instance is going down, but master instance is fine...
Below logs of slave :
2015-10-30T05:45:13.362789+01:00 db11 pgpool[38615]:
check_pgpool_status_by_hb: lifecheck failed. pgpool 1
(db10.xxx.com:9999) seems not to be working
2015-10-30T05:45:13.362817+01:00 db11 pgpool[38615]: pgpool_down:
db10.xxx.com:9999 is going down
2015-10-30T05:45:13.377575+01:00 db11 pgpool[38615]: pgpool_down: I'm
oldest so standing for master
2015-10-30T05:45:13.383137+01:00 db11 pgpool[38615]: wd_escalation:
escalating to master pgpool
2015-10-30T05:46:36.808437+01:00 db11 pgpool[38615]: wd_escalation:
escalation command succeeded
2015-10-30T05:46:36.838590+01:00 db11 pgpool[38615]: wd_IP_up: ifconfig
up failed
2015-10-30T05:46:36.838608+01:00 db11 pgpool[38615]: wd_escalation:
escalated to master pgpool with some errors
2015-10-30T05:46:39.838775+01:00 db11 pgpool[38615]:
check_pgpool_status_by_hb: pgpool 1 (db10.xxx.com:9999) is in down status
2015-10-30T05:46:42.838886+01:00 db11 pgpool[38615]:
check_pgpool_status_by_hb: pgpool 1 (db10.xxx.com:9999) is in down status
...
Logs still says (old) master instance is down.
However, port is open, below check with telnet
[09:52:04]root at db11:/var/log$ telnet db10.xxx.com 9999
Trying xx.xx.xx.xx...
Connected to db10.xxx.com.
Escape character is '^]'.
It can happen on both pgpool instances (I have the same logs - dbxx:9999
is in down status - in master & slave instance)
Furthermore, delegate IP is declared on both instances (but is still
routed on old master instance ... )
db10
eth3:1 Link encap:Ethernet HWaddr 90:e2:ba:6d:b1:49
inet addr:172.25.1.5 Bcast:172.25.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
db11
eth5:1 Link encap:Ethernet HWaddr 90:e2:ba:8a:ca:5d
inet addr:172.25.1.5 Bcast:172.25.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Parameters :
db10:
delegate_IP = '172.25.1.5'
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 20
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'db11.xxx.com'
heartbeat_destination_port0 = 9694
heartbeat_device0 = 'eth3'
db11:
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 20
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'db10.neteven.com'
heartbeat_destination_port0 = 9694
heartbeat_device0 = 'eth5'
Does anyone has an idea ?
Thanks
--
Thomas
More information about the pgpool-general
mailing list