[pgpool-general: 2535] Re: wd_escalation_command exit code

Yugo Nagata nagata at sraoss.co.jp
Wed Feb 5 12:09:33 JST 2014


Hi,

On Tue, 4 Feb 2014 11:18:16 +0400
Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:

> Hi!
> 
> Finally had a chance to apply the last patch.

Thanks!

> 
> This is what I got in the log:
> 
> pgpool[12464]: wd_chk_setuid all commands have setuid bit
> pgpool[12464]: watchdog might call network commands which using setuid bit.
> pgpool[12464]: Backend status file /var/log/postgresql/pgpool_status discarded
> pgpool[12464]: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> pgpool[12464]: exec_ping: failed to ping 192.168.33.200: exit code 1
> pgpool[12464]: wd_escalation: escalating to master pgpool
> pgpool[12464]: exec_ifconfig: 'ifconfig eth1:0 $_IP_$ netmask 255.255.255.0' succeeded
> pgpool[12464]: exec_ifconfig: 'arping -U $_IP_$ -w 1' failed. exit status: 1

The arping's exit status is not zero. It's odd, in my machine (CentOS 5/6 and Ubuntu)
this always returns zero.

Could you please execute arping while pgpool is working and VIP is up, and this exit code?

$ sudo /usr/sbin/arping -U 192.168.33.200 -w 1
$ echo $?

In addtion, please check version of arping.

$ /usr/sbin/arping -h
$ /usr/sbin/arping -V


> pgpool[12464]: wd_IP_up: ifconfig up failed
> pgpool[12464]: wd_declare: send the packet to declare the new master
> pgpool[12464]: wd_escalation: escalated to master pgpool with some errors
> pgpool[12464]: wd_init: start watchdog
> 
> 
> 
> 
> On Jan 31, 2014, at 7:16 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> 
> > On Fri, 31 Jan 2014 00:50:37 +0400
> > Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> > 
> >> 
> >> On Jan 30, 2014, at 8:40 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >> 
> >>> Hi,
> >>> 
> >>> On Wed, 29 Jan 2014 10:26:00 +0400
> >>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>> 
> >>>> Hi!
> >>>> 
> >>>> I'm testing this patch on a vagrant/virtualbox based VM. 
> >>>> 
> >>>> # uname -a
> >>>> Linux lb-node1 3.2.0-55-generic #85-Ubuntu SMP Wed Oct 2 12:29:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> >>>> 
> >>>> # cat /etc/issue
> >>>> Ubuntu 12.04.3 LTS \n \l
> >>>> 
> >>>> This is the output of ifconfig before starting pgpool:
> >>>> 
> >>>> eth0      Link encap:Ethernet  HWaddr 08:00:27:03:2b:89
> >>>>         inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
> >>>>         inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
> >>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>>>         RX packets:7198 errors:0 dropped:0 overruns:0 frame:0
> >>>>         TX packets:4853 errors:0 dropped:0 overruns:0 carrier:0
> >>>>         collisions:0 txqueuelen:1000
> >>>>         RX bytes:553607 (553.6 KB)  TX bytes:722721 (722.7 KB)
> >>>> 
> >>>> eth1      Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
> >>>>         inet addr:192.168.33.11  Bcast:192.168.33.255  Mask:255.255.255.0
> >>>>         inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
> >>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>>>         RX packets:23682 errors:0 dropped:0 overruns:0 frame:0
> >>>>         TX packets:4876 errors:0 dropped:0 overruns:0 carrier:0
> >>>>         collisions:0 txqueuelen:1000
> >>>>         RX bytes:2551344 (2.5 MB)  TX bytes:646217 (646.2 KB)
> >>>> 
> >>>> lo        Link encap:Local Loopback
> >>>>         inet addr:127.0.0.1  Mask:255.0.0.0
> >>>>         inet6 addr: ::1/128 Scope:Host
> >>>>         UP LOOPBACK RUNNING  MTU:16436  Metric:1
> >>>>         RX packets:1636 errors:0 dropped:0 overruns:0 frame:0
> >>>>         TX packets:1636 errors:0 dropped:0 overruns:0 carrier:0
> >>>>         collisions:0 txqueuelen:0
> >>>>         RX bytes:109868 (109.8 KB)  TX bytes:109868 (109.8 KB)
> >>>> 
> >>>> 
> >>>> /etc/pgpool2/pgpool.conf:
> >>>> ...
> >>>> debug_level                   = 9
> >>>> …
> >>>> delegate_IP                   = '192.168.33.200'
> >>>> ...
> >>>> ifconfig_path                 = '/sbin'
> >>>> if_up_cmd                     = 'ifconfig eth1:0 $_IP_$ netmask 255.255.255.0'
> >>>> if_down_cmd                   = 'ifconfig eth1:0 down'
> >>>> ...
> >>>> 
> >>>> 
> >>>> Once I start pgpool I get the following ifconfig output
> >>>> 
> >>>> 
> >>>> eth0      Link encap:Ethernet  HWaddr 08:00:27:03:2b:89
> >>>>         inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
> >>>>         inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
> >>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>>>         RX packets:7939 errors:0 dropped:0 overruns:0 frame:0
> >>>>         TX packets:5404 errors:0 dropped:0 overruns:0 carrier:0
> >>>>         collisions:0 txqueuelen:1000
> >>>>         RX bytes:606232 (606.2 KB)  TX bytes:816924 (816.9 KB)
> >>>> 
> >>>> eth1      Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
> >>>>         inet addr:192.168.33.11  Bcast:192.168.33.255  Mask:255.255.255.0
> >>>>         inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
> >>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>>>         RX packets:25179 errors:0 dropped:0 overruns:0 frame:0
> >>>>         TX packets:5204 errors:0 dropped:0 overruns:0 carrier:0
> >>>>         collisions:0 txqueuelen:1000
> >>>>         RX bytes:2704567 (2.7 MB)  TX bytes:690834 (690.8 KB)
> >>>> 
> >>>> eth1:0    Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
> >>>>         inet addr:192.168.33.200  Bcast:192.168.33.255  Mask:255.255.255.0
> >>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>>> 
> >>>> lo        Link encap:Local Loopback
> >>>>         inet addr:127.0.0.1  Mask:255.0.0.0
> >>>>         inet6 addr: ::1/128 Scope:Host
> >>>>         UP LOOPBACK RUNNING  MTU:16436  Metric:1
> >>>>         RX packets:1745 errors:0 dropped:0 overruns:0 frame:0
> >>>>         TX packets:1745 errors:0 dropped:0 overruns:0 carrier:0
> >>>>         collisions:0 txqueuelen:0
> >>>>         RX bytes:117264 (117.2 KB)  TX bytes:117264 (117.2 KB)
> >>>> 
> >>>> 
> >>>> 
> >>>> # ping 192.168.33.200
> >>>> PING 192.168.33.200 (192.168.33.200) 56(84) bytes of data.
> >>>> 64 bytes from 192.168.33.200: icmp_req=1 ttl=64 time=0.060 ms
> >>>> ^C
> >>>> --- 192.168.33.200 ping statistics ---
> >>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >>>> rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
> >>>> 
> >>>> 
> >>>> And these are some messages from pgpool.log:
> >>>> 
> >>>> pgpool[4152]: wd_chk_setuid all commands have setuid bit
> >>>> pgpool[4152]: watchdog might call network commands which using setuid bit.
> >>>> pgpool[4152]: exec_ping: failed to ping 192.168.33.200
> >>>> pgpool[4152]: wd_escalation: escalating to master pgpool
> >>>> pgpool[4152]: wd_IP_up: ifconfig up failed
> >>>> pgpool[4152]: wd_declare: send the packet to declare the new master
> >>>> pgpool[4152]: wd_escalation: escalated to master pgpool with some errors
> >>> 
> >>> That's funny. This says that "failed to ping" but VIP is brought up in fact. 
> >>> It may take times between ifconfig and ping. However, pgpool should try to 
> >>> ping up to three times before this succeeds, but this is tried only one time
> >>> in the case.
> >>> 
> >>> For analysis, I would appreciate it if you would apply the attached patch and
> >>> send the log output messages.
> >> 
> >> start:
> >> 
> >> pgpool[2493]: num_backends: 2 total_weight: 2.000000
> >> pgpool[2493]: backend 0 weight: 1073741823.500000
> >> pgpool[2493]: backend 0 flag: 0000
> >> pgpool[2493]: backend 1 weight: 1073741823.500000
> >> pgpool[2493]: backend 1 flag: 0000
> >> pgpool[2493]: loading "/etc/pgpool2/pool_hba.conf" for client authentication configuration file
> >> pgpool[2493]: wd_chk_setuid all commands have setuid bit
> >> pgpool[2493]: watchdog might call network commands which using setuid bit.
> >> pgpool[2493]: Backend status file /var/log/postgresql/pgpool_status discarded
> >> pgpool[2493]: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> >> pgpool[2493]: send_packet_4_nodes: packet for lb-node2.site:9000 is canceled
> >> pgpool[2493]: exec_ping: failed to ping 192.168.33.200: exit code 1
> >> pgpool[2493]: wd_escalation: escalating to master pgpool
> >> pgpool[2493]: wd_IP_up: ifconfig up failed
> >> pgpool[2493]: wd_declare: send the packet to declare the new master
> >> pgpool[2493]: wd_escalation: escalated to master pgpool with some errors
> >> pgpool[2493]: wd_init: start watchdog
> >> 
> >> 
> >> 
> >> 
> >> stop:
> >> 
> >> pgpool[2504]: wd_IP_down: not delegate IP holder
> >> pgpool[2502]: hb_receiver child receives shutdown request signal 2
> >> pgpool[2503]: hb_sender child receives shutdown request signal 2
> >> pgpool[2589]: child received shutdown request signal 2
> >> pgpool[2493]: shmem_exit(0)
> >> 
> >> 
> >> 
> >> BTW, when I start/stop unpatched 3.3.2 version I see the same messages about ping failure. But everything works well in this case. 
> > 
> > I misunderstood that the ping failure occurs at bringing up VIP, but this is for
> > checking whether the VIP is already used by other host. So, this failure message
> > is no problem.
> > 
> > The problem is, there is no ping message after "escalating to master pgpool".
> > "ifconfig up failed" may be caused by arping command's failure, since ping
> > should be executed after arping succeeded.
> > 
> > Could you please try the next patch for analysis? This would oupute log
> > message when arping command fails.
> > 
> >> 
> >> unpatched start:
> >> 
> >> pgpool[7189]: exec_ping: failed to ping 192.168.33.200
> >> pgpool[7189]: wd_escalation: escalating to master pgpool
> >> pgpool[7189]: wd_declare: send the packet to declare the new master
> >> pgpool[7189]: wd_escalation: escalated to master pgpool successfully
> >> 
> >> unpatched stop:
> >> 
> >> pgpool[7198]: hb_receiver child receives shutdown request signal 2
> >> pgpool[7199]: hb_sender child receives shutdown request signal 2
> >> pgpool[7200]: exec_ping: failed to ping 192.168.33.200
> >> pgpool[7200]: wd_IP_down: ifconfig down succeeded
> >> pgpool[7189]: shmem_exit(0)
> >> 
> >> 
> >> 
> >>> 
> >>>> 
> >>>> 
> >>>> When I stop pgpool I get the following messages in pgpool.log:
> >>>> 
> >>>> pgpool[4163]: wd_IP_down: not delegate IP holder
> >>>> pgpool[4161]: hb_receiver child receives shutdown request signal 2
> >>>> pgpool[4162]: hb_sender child receives shutdown request signal 2
> >>>> pgpool[4152]: shmem_exit(0)
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> On Jan 29, 2014, at 6:42 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >>>> 
> >>>>> On Tue, 28 Jan 2014 23:03:20 +0400
> >>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>>>> 
> >>>>>> Hi!
> >>>>>> This patch applied successfully. But now a new problem. When I start pgpool service I get a new interface eth0:0 with failover IP address assigned as expected. But when I stop pgpool service eth0:0 won't go down. It remains even after complete shutdown of pgpool.
> >>>>> 
> >>>>> Odd, I can't reproduce this. Are there any error message?
> >>>>> What ifconfig command do you use?
> >>>>> 
> >>>>>> 
> >>>>>> I tried 3.3.2 without this patch and everything worked well. 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> On Jan 27, 2014, at 5:18 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >>>>>> 
> >>>>>>> On Sat, 25 Jan 2014 15:31:44 +0400
> >>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> On Jan 24, 2014, at 1:25 PM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >>>>>>>> 
> >>>>>>>>> On Tue, 21 Jan 2014 15:24:02 +0400
> >>>>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>>>>>>>> 
> >>>>>>>>>> Great! Now it is working!
> >>>>>>>>>> 
> >>>>>>>>>> pgpool[31903]: wd_escalation: escalation command failed. exit status: 1
> >>>>>>>>>> 
> >>>>>>>>>> Thank you!
> >>>>>>>>>> 
> >>>>>>>>>> Will this patch be included in 3.3.3 ?
> >>>>>>>>>> 
> >>>>>>>>>> Also, what about failed if_up_cmd and further pgpool behaviour (my second message in the thread.) ?
> >>>>>>>>> 
> >>>>>>>>> I attached the patch. Could you try this? In this fix, pgpool outputs a error 
> >>>>>>>>> message for if_up_cmd failure. This patch should be applied after the previous
> >>>>>>>>> patch. This fix will be included in 3.3.3.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Hi!
> >>>>>>>> 
> >>>>>>>> I tried to apply the patch against both 3.3.1 and 3.3.2
> >>>>>>>> 
> >>>>>>>> this is what I got:
> >>>>>>> 
> >>>>>>> Hmm.. Could you try the attached patch to 3.3.2? This includes allthe fix
> >>>>>>> for escalation command and ifconfig errors.
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/esc.patch
> >>>>>>>> 
> >>>>>>>> patching file src/watchdog/wd_packet.c
> >>>>>>>> Hunk #1 succeeded at 954 (offset 23 lines).
> >>>>>>>> 
> >>>>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/ifup.patch
> >>>>>>>> 
> >>>>>>>> patching file src/watchdog/wd_if.c
> >>>>>>>> Hunk #1 succeeded at 42 with fuzz 1 (offset 3 lines).
> >>>>>>>> Hunk #2 succeeded at 62 (offset 3 lines).
> >>>>>>>> Hunk #3 succeeded at 117 (offset 3 lines).
> >>>>>>>> patching file src/watchdog/wd_packet.c
> >>>>>>>> Hunk #1 succeeded at 654 (offset 23 lines).
> >>>>>>>> Hunk #2 succeeded at 939 (offset 23 lines).
> >>>>>>>> Hunk #3 FAILED at 932.
> >>>>>>>> Hunk #4 succeeded at 976 (offset 18 lines).
> >>>>>>>> 1 out of 4 hunks FAILED -- saving rejects to file src/watchdog/wd_packet.c.rej
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> src/watchdog/wd_packet.c.rej:
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> --- src/watchdog/wd_packet.c
> >>>>>>>> +++ src/watchdog/wd_packet.c
> >>>>>>>> @@ -932,22 +933,31 @@
> >>>>>>>> 	/* execute escalation command */
> >>>>>>>> 	if (strlen(pool_config->wd_escalation_command))
> >>>>>>>> 	{
> >>>>>>>> -		int r;
> >>>>>>>> 		r = system(pool_config->wd_escalation_command);
> >>>>>>>> 		if (WIFEXITED(r))
> >>>>>>>> 		{
> >>>>>>>> 			if (WEXITSTATUS(r) == EXIT_SUCCESS)
> >>>>>>>> 				pool_log("wd_escalation: escalation command succeeded");
> >>>>>>>> 			else
> >>>>>>>> +			{
> >>>>>>>> 				pool_error("wd_escalation: escalation command failed. exit status: %d", WEXITSTATUS(r));
> >>>>>>>> +				has_error = true;
> >>>>>>>> +			}
> >>>>>>>> 		}
> >>>>>>>> 		else
> >>>>>>>> +		{
> >>>>>>>> 			pool_error("wd_escalation: escalation command exit abnormally");
> >>>>>>>> +			has_error = true;
> >>>>>>>> +		}
> >>>>>>>> 	}
> >>>>>>>> 
> >>>>>>>> 	/* interface up as delegate IP */
> >>>>>>>> 	if (strlen(pool_config->delegate_IP) != 0)
> >>>>>>>> -		wd_IP_up();
> >>>>>>>> +	{
> >>>>>>>> +		r = wd_IP_up();
> >>>>>>>> +		if (r == WD_NG)
> >>>>>>>> +			has_error = true;
> >>>>>>>> +	}
> >>>>>>>> 
> >>>>>>>> 	/* set master status to the wd list */
> >>>>>>>> 	wd_set_wd_list(pool_config->wd_hostname, pool_config->port,
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> In addition, I consider that pgpool shoud go to down status when if_up_cmd fails, 
> >>>>>>>>> since this is worthless as a member of watchdog cluster. I'll make this fix for
> >>>>>>>>> either 3.3.3 or 3.4.0.
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Sounds reasonable. 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> -- 
> >>>>>>> Yugo Nagata <nagata at sraoss.co.jp>
> >>>>>>> <escalation_error_all.patch>
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> Yugo Nagata <nagata at sraoss.co.jp>
> >>>> 
> >>> 
> >>> 
> >>> -- 
> >>> Yugo Nagata <nagata at sraoss.co.jp>
> >>> <escalation_error_all_for_analysis.patch>
> >> 
> > 
> > 
> > -- 
> > Yugo Nagata <nagata at sraoss.co.jp>
> > <escalation_error_all_for_analysis2.patch>
> 


-- 
Yugo Nagata <nagata at sraoss.co.jp>


More information about the pgpool-general mailing list