View Issue Details

IDProjectCategoryView StatusLast Update
0000215Pgpool-IIBugpublic2016-08-01 23:43
Reportersupp_k Assigned ToMuhammad Usama  
PriorityurgentSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Platformpgpool-2OSCentOSOS Version6x and 7x
Summary0000215: pgpool doesnt escalate ip in case of another node inavailability
DescriptionHi,

We am trying to set up two pgpool servers in front of 2+ postgresql instances.
The pgpool (pgpool-II version 3.5.3 (ekieboshi)) balances all requests and everything is ok except one very significant exception.

We configured it to delegate a virtual IP (please see the attached configuration). And the delegation works only if the primary pgpool node shutdowns correctly (e.g. service pgpool stop). But it we emulate the primary server failure or a network problem but bringing down its network interface then the standby pgpool instance doesnt up the delegation IP.

Please see below the logs generated by slave pgpool instance:
2016-07-08 19:01:55: pid 20148: DEBUG: watchdog heartbeat: send 224 byte packet
2016-07-08 19:01:55: pid 20148: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.7.20:9694
2016-07-08 19:01:57: pid 20148: DEBUG: watchdog heartbeat: send 224 byte packet
2016-07-08 19:01:57: pid 20148: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.7.20:9694
2016-07-08 19:01:59: pid 20148: DEBUG: watchdog heartbeat: send 224 byte packet
2016-07-08 19:01:59: pid 20148: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.7.20:9694
2016-07-08 19:02:01: pid 20148: DEBUG: watchdog heartbeat: send 224 byte packet
2016-07-08 19:02:01: pid 20148: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.7.20:9694
2016-07-08 19:02:03: pid 20148: DEBUG: watchdog heartbeat: send 224 byte packet
2016-07-08 19:02:03: pid 20148: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.7.20:9694
2016-07-08 19:02:03: pid 20141: DEBUG: STATE MACHINE INVOKED WITH EVENT = TIMEOUT Current State = MASTER
2016-07-08 19:02:03: pid 20141: DEBUG: not sending watchdog internal command packet to DEAD Linux_warm0.local_9999
2016-07-08 19:02:05: pid 20148: DEBUG: watchdog heartbeat: send 224 byte packet
2016-07-08 19:02:05: pid 20148: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.7.20:9694
2016-07-08 19:02:06: pid 20146: DEBUG: watchdog checking life check is ready
2016-07-08 19:02:06: pid 20146: DETAIL: pgpool:1 at "192.168.7.20:9999" has not send the heartbeat signal yet
2016-07-08 19:02:07: pid 20148: DEBUG: watchdog heartbeat: send 224 byte packet




Thanks in advance!

Additional InformationP/S: The configuration file of the slave server is attached. The primary configuration file is absolutely the same except just different IPs mentioned;
TagsNo tags attached.

Activities

supp_k

2016-07-09 01:07

reporter  

pgpool.conf (33,252 bytes)

supp_k

2016-07-09 01:40

reporter  

data.tar.gz (174,080 bytes)

supp_k

2016-07-11 16:29

reporter   ~0000884

I could manage to solve the problem by specifying hostnames in the configuration files instead of IP addresses. Now it works and I see "lifecheck started" in in log files.

Muhammad Usama

2016-07-13 00:33

developer   ~0000899

Hi,
Thanks for the report. I have found the problem in the heartbeat receiver code that fails to identify the heartbeat sender watchdog node when the heartbeat destination is specified in terms of an IP address while wd_hostname is configured as a hostname string or vice versa.

Can you please try the attached wd_hb_fix.diff patch, If it solves the issue.
(The patch can be applied to both master and 3.5 branches)


Muhammad Usama

2016-07-13 00:34

developer  

wd_hb_fix.diff (2,864 bytes)   
wd_hb_fix.diff (2,864 bytes)   

supp_k

2016-08-01 15:52

reporter   ~0000951

Hi,

can you apply this patch to the "stable" release. I can apply it locally but I need official build that I can recommend to my customers. This because customers will not accept "patching" on the fly.

Serk.

Muhammad Usama

2016-08-01 23:43

developer   ~0000953

Thanks for the verification, I have pushed the fix to master and 3.5 branches

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commitdiff;h=ff7a6e8218346da56b5442b33913b2673f73bf7b

Issue History

Date Modified Username Field Change
2016-07-09 01:07 supp_k New Issue
2016-07-09 01:07 supp_k File Added: pgpool.conf
2016-07-09 01:40 supp_k File Added: data.tar.gz
2016-07-11 16:29 supp_k Note Added: 0000884
2016-07-12 13:31 t-ishii Assigned To => Muhammad Usama
2016-07-12 13:31 t-ishii Status new => assigned
2016-07-12 13:31 t-ishii Additional Information Updated
2016-07-13 00:33 Muhammad Usama Note Added: 0000899
2016-07-13 00:33 Muhammad Usama Status assigned => confirmed
2016-07-13 00:34 Muhammad Usama File Added: wd_hb_fix.diff
2016-08-01 15:52 supp_k Note Added: 0000951
2016-08-01 23:43 Muhammad Usama Status confirmed => resolved
2016-08-01 23:43 Muhammad Usama Resolution open => fixed
2016-08-01 23:43 Muhammad Usama Note Added: 0000953