View Issue Details

IDProjectCategoryView StatusLast Update
0000134pgpool-HABugpublic2015-06-11 13:47
ReporterQipanAssigned ToMuhammad Usama 
PrioritynormalSeverityminorReproducibilityhave not tried
Status resolvedResolutionopen 
PlatformLinuxOSUbuntu 12.04.5 LTSOS Version
Product Version 
Target VersionFixed in Version 
Summary0000134: down pgpool wd node failback failed
DescriptionHi,

I used pgpool-HA wd switch to new master works good, the problem I found is when I decide to put the down pgpool node back to production,it can be repeat on my test envrionment.
pgpool --version
pgpool-II version 3.3.4 (tokakiboshi)




Steps To Reproducevip: 192.168.0.201
pgpool1 192.168.0.105 (active):
service pgpool2 stop
still can see pgpool process:
ps aux|grep pgpool
postgres 39918 0.0 0.0 186584 15156 pts/0 S 21:17 0:00 /usr/sbin/pgpool -n
postgres 39919 0.0 0.0 7152 716 pts/0 S 21:17 0:00 logger -t pgpool -p local0.info
postgres 39985 0.0 0.0 252208 2028 pts/0 Sl 21:17 0:00 pgpool: lifecheck
root 40257 0.0 0.0 9380 924 pts/0 S+ 21:23 0:00 grep --color pgpool

Apr 26 00:23:23 pgpool[40171]: child received shutdown request signal 2
Apr 26 00:23:23 pgpool[40001]: child received shutdown request signal 2
Apr 26 00:23:23 pgpool[39996]: child received shutdown request signal 2
Apr 26 00:23:23 pgpool[39995]: child received shutdown request signal 2
Apr 26 00:23:23 pgpool[39990]: child received shutdown request signal 2
Apr 26 00:23:23 pgpool[39985]: exec_ifconfig: 'pg_ifconfig eth1:0 $_IP_$ 255.255.255.0 down' succeeded
Apr 26 00:23:25 ntpd[13090]: Deleting interface 0000016 eth1:0, 192.168.0.201#123, interface stats: received=0, sent=0, dropped=0, active_time=337 secs
Apr 26 00:23:25 ntpd[13090]: peers refreshed
Apr 26 00:23:26 pgpool[39985]: exec_ping: failed to ping 192.168.0.201: exit code 1
Apr 26 00:23:26 pgpool[39985]: wd_IP_down: ifconfig down succeeded
Apr 26 00:24:18 pgpool[39918]: received fast shutdown request
Apr 26 00:24:18 pgpool[39918]: pgpool main: close listen socket

pgpool2 192.168.0.106 (standby->active) works good.

when I want to pgpool1 back to standby to new node.
force killed process didn't shutdown. And I run start command, it didn't start normally:
ps aux|grep pgpool
postgres 40922 0.0 0.0 186584 15148 pts/0 Sl 22:01 0:00 /usr/sbin/pgpool -n
postgres 40923 0.0 0.0 7152 716 pts/0 S 22:01 0:00 logger -t pgpool -p local0.info
root 40944 0.0 0.0 9380 924 pts/0 S+ 22:01 0:00 grep --color pgpool
tailf /var/log/syslog :
Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: key: check_temp_table
Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: value: on kind: 1
Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: key: check_unlogged_table
Apr 26 01:01:50 pgpool[40922]: num_backends: 2 total_weight: 2.000000
Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: value: on kind: 1
Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: key: memory_cache_enabled
Apr 26 01:01:50 pgpool[40922]: backend 0 weight: 1073741823.500000
Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: value: off kind: 1
Apr 26 01:01:50 pgpool[40922]: backend 0 flag: 0000
Apr 26 01:01:50 pgpool[40922]: backend 1 weight: 1073741823.500000
Apr 26 01:01:50 pgpool[40922]: backend 1 flag: 0000
Apr 26 01:01:50 pgpool[40922]: loading "/etc/pgpool2/pool_hba.conf" for client authentication configuration file
Apr 26 01:01:50 pgpool[40922]: wd_chk_setuid: ifup[/var/lib/postgresql/bin/pg_ifconfig] doesn't have setuid bit
Apr 26 01:01:52 pgpool[40922]: exec_ping: succeed to ping 192.168.0.105
Apr 26 01:01:52 pgpool[40922]: get_result: ping data: PING 192.168.0.105 (192.168.0.105) 56(84) bytes of data.0000012#012--- 192.168.0.105 ping statistics ---0000123 packets transmitted, 3 received, 0% packet loss, time 1998ms#012rtt min/avg/max/mdev = 0.008/0.010/0.016/0.005 ms
Apr 26 01:01:52 pgpool[40922]: exec_ping: succeed to ping 192.168.0.106
Apr 26 01:01:52 pgpool[40922]: get_result: ping data: PING 192.168.0.106 (192.168.0.106) 56(84) bytes of data.0000012#012--- 192.168.0.106 ping statistics ---0000123 packets transmitted, 3 received, 0% packet loss, time 1998ms#012rtt min/avg/max/mdev = 0.101/0.132/0.156/0.026 ms
(hang here....)
TagsNo tags attached.

Activities

Qipan

2015-04-26 14:08

reporter   ~0000531

Could I know what is right process to get old down nodes back to standby?
Thanks a lot!

Qipan

Qipan

2015-04-26 14:38

reporter   ~0000532

It works good only if I restart both two pgpool nodes.

Is it a bug for pgpool watchdog? Is there some right method to get back the down wd node without restart active node?

Qipan

2015-05-21 14:09

reporter   ~0000537

I tested that 3.3.6 solves this problem.

    - Fix to use void * type for receiving return value of thread function
      (Yugo Nagata)
      
      Previously int type was used and this could occur stack buffer
      overflow. This caused an infinity loop of ping error at bringing
      up or down VIP.

Issue History

Date Modified Username Field Change
2015-04-26 14:05 Qipan New Issue
2015-04-26 14:08 Qipan Note Added: 0000531
2015-04-26 14:38 Qipan Note Added: 0000532
2015-05-21 08:22 t-ishii Assigned To => Muhammad Usama
2015-05-21 08:22 t-ishii Status new => assigned
2015-05-21 14:09 Qipan Note Added: 0000537
2015-06-11 13:47 t-ishii Status assigned => resolved