0000134: down pgpool wd node failback failed - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000134	pgpool-HA	Bug	public	2015-04-26 14:05	2015-06-11 13:47

Reporter	Qipan	Assigned To	Muhammad Usama
Priority	normal	Severity	minor	Reproducibility	have not tried
Status	resolved	Resolution	open
Platform	Linux	OS	Ubuntu 12.04.5 LTS

Summary	0000134: down pgpool wd node failback failed
Description	Hi, I used pgpool-HA wd switch to new master works good, the problem I found is when I decide to put the down pgpool node back to production,it can be repeat on my test envrionment. pgpool --version pgpool-II version 3.3.4 (tokakiboshi)
Steps To Reproduce	vip: 192.168.0.201 pgpool1 192.168.0.105 (active): service pgpool2 stop still can see pgpool process: ps aux\|grep pgpool postgres 39918 0.0 0.0 186584 15156 pts/0 S 21:17 0:00 /usr/sbin/pgpool -n postgres 39919 0.0 0.0 7152 716 pts/0 S 21:17 0:00 logger -t pgpool -p local0.info postgres 39985 0.0 0.0 252208 2028 pts/0 Sl 21:17 0:00 pgpool: lifecheck root 40257 0.0 0.0 9380 924 pts/0 S+ 21:23 0:00 grep --color pgpool Apr 26 00:23:23 pgpool[40171]: child received shutdown request signal 2 Apr 26 00:23:23 pgpool[40001]: child received shutdown request signal 2 Apr 26 00:23:23 pgpool[39996]: child received shutdown request signal 2 Apr 26 00:23:23 pgpool[39995]: child received shutdown request signal 2 Apr 26 00:23:23 pgpool[39990]: child received shutdown request signal 2 Apr 26 00:23:23 pgpool[39985]: exec_ifconfig: 'pg_ifconfig eth1:0 $_IP_$ 255.255.255.0 down' succeeded Apr 26 00:23:25 ntpd[13090]: Deleting interface 0000016 eth1:0, 192.168.0.201#123, interface stats: received=0, sent=0, dropped=0, active_time=337 secs Apr 26 00:23:25 ntpd[13090]: peers refreshed Apr 26 00:23:26 pgpool[39985]: exec_ping: failed to ping 192.168.0.201: exit code 1 Apr 26 00:23:26 pgpool[39985]: wd_IP_down: ifconfig down succeeded Apr 26 00:24:18 pgpool[39918]: received fast shutdown request Apr 26 00:24:18 pgpool[39918]: pgpool main: close listen socket pgpool2 192.168.0.106 (standby->active) works good. when I want to pgpool1 back to standby to new node. force killed process didn't shutdown. And I run start command, it didn't start normally: ps aux\|grep pgpool postgres 40922 0.0 0.0 186584 15148 pts/0 Sl 22:01 0:00 /usr/sbin/pgpool -n postgres 40923 0.0 0.0 7152 716 pts/0 S 22:01 0:00 logger -t pgpool -p local0.info root 40944 0.0 0.0 9380 924 pts/0 S+ 22:01 0:00 grep --color pgpool tailf /var/log/syslog : Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: key: check_temp_table Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: value: on kind: 1 Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: key: check_unlogged_table Apr 26 01:01:50 pgpool[40922]: num_backends: 2 total_weight: 2.000000 Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: value: on kind: 1 Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: key: memory_cache_enabled Apr 26 01:01:50 pgpool[40922]: backend 0 weight: 1073741823.500000 Apr 26 01:01:50 pgpool: 2015-04-25 22:01:50 DEBUG: pid 40922: value: off kind: 1 Apr 26 01:01:50 pgpool[40922]: backend 0 flag: 0000 Apr 26 01:01:50 pgpool[40922]: backend 1 weight: 1073741823.500000 Apr 26 01:01:50 pgpool[40922]: backend 1 flag: 0000 Apr 26 01:01:50 pgpool[40922]: loading "/etc/pgpool2/pool_hba.conf" for client authentication configuration file Apr 26 01:01:50 pgpool[40922]: wd_chk_setuid: ifup[/var/lib/postgresql/bin/pg_ifconfig] doesn't have setuid bit Apr 26 01:01:52 pgpool[40922]: exec_ping: succeed to ping 192.168.0.105 Apr 26 01:01:52 pgpool[40922]: get_result: ping data: PING 192.168.0.105 (192.168.0.105) 56(84) bytes of data.0000012#012--- 192.168.0.105 ping statistics ---0000123 packets transmitted, 3 received, 0% packet loss, time 1998ms#012rtt min/avg/max/mdev = 0.008/0.010/0.016/0.005 ms Apr 26 01:01:52 pgpool[40922]: exec_ping: succeed to ping 192.168.0.106 Apr 26 01:01:52 pgpool[40922]: get_result: ping data: PING 192.168.0.106 (192.168.0.106) 56(84) bytes of data.0000012#012--- 192.168.0.106 ping statistics ---0000123 packets transmitted, 3 received, 0% packet loss, time 1998ms#012rtt min/avg/max/mdev = 0.101/0.132/0.156/0.026 ms (hang here....)
Tags	No tags attached.

Qipan 2015-04-26 14:08 reporter ~0000531	Could I know what is right process to get old down nodes back to standby? Thanks a lot! Qipan

Qipan 2015-04-26 14:38 reporter ~0000532	It works good only if I restart both two pgpool nodes. Is it a bug for pgpool watchdog? Is there some right method to get back the down wd node without restart active node?

Qipan 2015-05-21 14:09 reporter ~0000537	I tested that 3.3.6 solves this problem. - Fix to use void * type for receiving return value of thread function (Yugo Nagata) Previously int type was used and this could occur stack buffer overflow. This caused an infinity loop of ping error at bringing up or down VIP.

Date Modified	Username	Field	Change
2015-04-26 14:05	Qipan	New Issue
2015-04-26 14:08	Qipan	Note Added: 0000531
2015-04-26 14:38	Qipan	Note Added: 0000532
2015-05-21 08:22	t-ishii	Assigned To	=> Muhammad Usama
2015-05-21 08:22	t-ishii	Status	new => assigned
2015-05-21 14:09	Qipan	Note Added: 0000537
2015-06-11 13:47	t-ishii	Status	assigned => resolved