[pgpool-general: 2537] Re: wd_escalation_command exit code

Sergey Arlashin sergeyarl.maillist at gmail.com
Wed Feb 5 21:13:30 JST 2014


On Feb 5, 2014, at 7:09 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:

> Hi,
> 
> On Tue, 4 Feb 2014 11:18:16 +0400
> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> 
>> Hi!
>> 
>> Finally had a chance to apply the last patch.
> 
> Thanks!
> 
>> 
>> This is what I got in the log:
>> 
>> pgpool[12464]: wd_chk_setuid all commands have setuid bit
>> pgpool[12464]: watchdog might call network commands which using setuid bit.
>> pgpool[12464]: Backend status file /var/log/postgresql/pgpool_status discarded
>> pgpool[12464]: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
>> pgpool[12464]: exec_ping: failed to ping 192.168.33.200: exit code 1
>> pgpool[12464]: wd_escalation: escalating to master pgpool
>> pgpool[12464]: exec_ifconfig: 'ifconfig eth1:0 $_IP_$ netmask 255.255.255.0' succeeded
>> pgpool[12464]: exec_ifconfig: 'arping -U $_IP_$ -w 1' failed. exit status: 1
> 
> The arping's exit status is not zero. It's odd, in my machine (CentOS 5/6 and Ubuntu)
> this always returns zero.
> 
> Could you please execute arping while pgpool is working and VIP is up, and this exit code?
> 
> $ sudo /usr/sbin/arping -U 192.168.33.200 -w 1
> $ echo $?
> 
> In addtion, please check version of arping.
> 
> $ /usr/sbin/arping -h
> $ /usr/sbin/arping -V
> 
> 

Oh, it seems we have different arping versions. 

This is the one I have in ubuntu 12.04: 

# arping -V
arping: invalid option -- 'V'
ARPing 2.09, by Thomas Habets <thomas at habets.pp.se>
usage: arping [ -0aAbdDeFpqrRuv ] [ -w <us> ] [ -S <host/ip> ]
              [ -T <host/ip ] [ -s <MAC> ] [ -t <MAC> ] [ -c <count> ]
              [ -i <interface> ] <host/ip/MAC | -B>
For complete usage info, use --help or check the manpage.


After I added the following line to pgpool.conf 

arping_cmd  = 'arping -S $_IP_$ -i eth1 192.168.33.1 -uc 2'

things got much better. Everything is working now as expected. 

But it was the patchless 3.3.2 which confused me. It worked fine even with failed arping. The interface was brought both up and down without any issues.In my opinion a behaviour like this looks at least strange and should be documented :) .

As to the if_up_cmd behaviour - it seems ok now - I can see all the messages in pgpool.log:

pgpool[30061]: wd_escalation: escalating to master pgpool
pgpool[30061]: exec_ifconfig: 'ifconfig up eth1:PGPOOL $_IP_$ 255.255.255.0' failed. exit status: 1
pgpool[30061]: wd_IP_up: ifconfig up failed
pgpool[30061]: wd_declare: send the packet to declare the new master
pgpool[30061]: wd_escalation: escalated to master pgpool with some errors
pgpool[30061]: wd_init: start watchdog

Thank you very much for your support!




>> pgpool[12464]: wd_IP_up: ifconfig up failed
>> pgpool[12464]: wd_declare: send the packet to declare the new master
>> pgpool[12464]: wd_escalation: escalated to master pgpool with some errors
>> pgpool[12464]: wd_init: start watchdog
>> 
>> 
>> 
>> 
>> On Jan 31, 2014, at 7:16 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>> 
>>> On Fri, 31 Jan 2014 00:50:37 +0400
>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>> 
>>>> 
>>>> On Jan 30, 2014, at 8:40 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> On Wed, 29 Jan 2014 10:26:00 +0400
>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>>>> 
>>>>>> Hi!
>>>>>> 
>>>>>> I'm testing this patch on a vagrant/virtualbox based VM. 
>>>>>> 
>>>>>> # uname -a
>>>>>> Linux lb-node1 3.2.0-55-generic #85-Ubuntu SMP Wed Oct 2 12:29:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>>>>>> 
>>>>>> # cat /etc/issue
>>>>>> Ubuntu 12.04.3 LTS \n \l
>>>>>> 
>>>>>> This is the output of ifconfig before starting pgpool:
>>>>>> 
>>>>>> eth0      Link encap:Ethernet  HWaddr 08:00:27:03:2b:89
>>>>>>        inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
>>>>>>        inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
>>>>>>        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>>>        RX packets:7198 errors:0 dropped:0 overruns:0 frame:0
>>>>>>        TX packets:4853 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>        collisions:0 txqueuelen:1000
>>>>>>        RX bytes:553607 (553.6 KB)  TX bytes:722721 (722.7 KB)
>>>>>> 
>>>>>> eth1      Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
>>>>>>        inet addr:192.168.33.11  Bcast:192.168.33.255  Mask:255.255.255.0
>>>>>>        inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
>>>>>>        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>>>        RX packets:23682 errors:0 dropped:0 overruns:0 frame:0
>>>>>>        TX packets:4876 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>        collisions:0 txqueuelen:1000
>>>>>>        RX bytes:2551344 (2.5 MB)  TX bytes:646217 (646.2 KB)
>>>>>> 
>>>>>> lo        Link encap:Local Loopback
>>>>>>        inet addr:127.0.0.1  Mask:255.0.0.0
>>>>>>        inet6 addr: ::1/128 Scope:Host
>>>>>>        UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>>>>>        RX packets:1636 errors:0 dropped:0 overruns:0 frame:0
>>>>>>        TX packets:1636 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>        collisions:0 txqueuelen:0
>>>>>>        RX bytes:109868 (109.8 KB)  TX bytes:109868 (109.8 KB)
>>>>>> 
>>>>>> 
>>>>>> /etc/pgpool2/pgpool.conf:
>>>>>> ...
>>>>>> debug_level                   = 9
>>>>>>>>>>>> delegate_IP                   = '192.168.33.200'
>>>>>> ...
>>>>>> ifconfig_path                 = '/sbin'
>>>>>> if_up_cmd                     = 'ifconfig eth1:0 $_IP_$ netmask 255.255.255.0'
>>>>>> if_down_cmd                   = 'ifconfig eth1:0 down'
>>>>>> ...
>>>>>> 
>>>>>> 
>>>>>> Once I start pgpool I get the following ifconfig output
>>>>>> 
>>>>>> 
>>>>>> eth0      Link encap:Ethernet  HWaddr 08:00:27:03:2b:89
>>>>>>        inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
>>>>>>        inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
>>>>>>        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>>>        RX packets:7939 errors:0 dropped:0 overruns:0 frame:0
>>>>>>        TX packets:5404 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>        collisions:0 txqueuelen:1000
>>>>>>        RX bytes:606232 (606.2 KB)  TX bytes:816924 (816.9 KB)
>>>>>> 
>>>>>> eth1      Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
>>>>>>        inet addr:192.168.33.11  Bcast:192.168.33.255  Mask:255.255.255.0
>>>>>>        inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
>>>>>>        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>>>        RX packets:25179 errors:0 dropped:0 overruns:0 frame:0
>>>>>>        TX packets:5204 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>        collisions:0 txqueuelen:1000
>>>>>>        RX bytes:2704567 (2.7 MB)  TX bytes:690834 (690.8 KB)
>>>>>> 
>>>>>> eth1:0    Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
>>>>>>        inet addr:192.168.33.200  Bcast:192.168.33.255  Mask:255.255.255.0
>>>>>>        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>>> 
>>>>>> lo        Link encap:Local Loopback
>>>>>>        inet addr:127.0.0.1  Mask:255.0.0.0
>>>>>>        inet6 addr: ::1/128 Scope:Host
>>>>>>        UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>>>>>        RX packets:1745 errors:0 dropped:0 overruns:0 frame:0
>>>>>>        TX packets:1745 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>        collisions:0 txqueuelen:0
>>>>>>        RX bytes:117264 (117.2 KB)  TX bytes:117264 (117.2 KB)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> # ping 192.168.33.200
>>>>>> PING 192.168.33.200 (192.168.33.200) 56(84) bytes of data.
>>>>>> 64 bytes from 192.168.33.200: icmp_req=1 ttl=64 time=0.060 ms
>>>>>> ^C
>>>>>> --- 192.168.33.200 ping statistics ---
>>>>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>>>>> rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
>>>>>> 
>>>>>> 
>>>>>> And these are some messages from pgpool.log:
>>>>>> 
>>>>>> pgpool[4152]: wd_chk_setuid all commands have setuid bit
>>>>>> pgpool[4152]: watchdog might call network commands which using setuid bit.
>>>>>> pgpool[4152]: exec_ping: failed to ping 192.168.33.200
>>>>>> pgpool[4152]: wd_escalation: escalating to master pgpool
>>>>>> pgpool[4152]: wd_IP_up: ifconfig up failed
>>>>>> pgpool[4152]: wd_declare: send the packet to declare the new master
>>>>>> pgpool[4152]: wd_escalation: escalated to master pgpool with some errors
>>>>> 
>>>>> That's funny. This says that "failed to ping" but VIP is brought up in fact. 
>>>>> It may take times between ifconfig and ping. However, pgpool should try to 
>>>>> ping up to three times before this succeeds, but this is tried only one time
>>>>> in the case.
>>>>> 
>>>>> For analysis, I would appreciate it if you would apply the attached patch and
>>>>> send the log output messages.
>>>> 
>>>> start:
>>>> 
>>>> pgpool[2493]: num_backends: 2 total_weight: 2.000000
>>>> pgpool[2493]: backend 0 weight: 1073741823.500000
>>>> pgpool[2493]: backend 0 flag: 0000
>>>> pgpool[2493]: backend 1 weight: 1073741823.500000
>>>> pgpool[2493]: backend 1 flag: 0000
>>>> pgpool[2493]: loading "/etc/pgpool2/pool_hba.conf" for client authentication configuration file
>>>> pgpool[2493]: wd_chk_setuid all commands have setuid bit
>>>> pgpool[2493]: watchdog might call network commands which using setuid bit.
>>>> pgpool[2493]: Backend status file /var/log/postgresql/pgpool_status discarded
>>>> pgpool[2493]: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
>>>> pgpool[2493]: send_packet_4_nodes: packet for lb-node2.site:9000 is canceled
>>>> pgpool[2493]: exec_ping: failed to ping 192.168.33.200: exit code 1
>>>> pgpool[2493]: wd_escalation: escalating to master pgpool
>>>> pgpool[2493]: wd_IP_up: ifconfig up failed
>>>> pgpool[2493]: wd_declare: send the packet to declare the new master
>>>> pgpool[2493]: wd_escalation: escalated to master pgpool with some errors
>>>> pgpool[2493]: wd_init: start watchdog
>>>> 
>>>> 
>>>> 
>>>> 
>>>> stop:
>>>> 
>>>> pgpool[2504]: wd_IP_down: not delegate IP holder
>>>> pgpool[2502]: hb_receiver child receives shutdown request signal 2
>>>> pgpool[2503]: hb_sender child receives shutdown request signal 2
>>>> pgpool[2589]: child received shutdown request signal 2
>>>> pgpool[2493]: shmem_exit(0)
>>>> 
>>>> 
>>>> 
>>>> BTW, when I start/stop unpatched 3.3.2 version I see the same messages about ping failure. But everything works well in this case. 
>>> 
>>> I misunderstood that the ping failure occurs at bringing up VIP, but this is for
>>> checking whether the VIP is already used by other host. So, this failure message
>>> is no problem.
>>> 
>>> The problem is, there is no ping message after "escalating to master pgpool".
>>> "ifconfig up failed" may be caused by arping command's failure, since ping
>>> should be executed after arping succeeded.
>>> 
>>> Could you please try the next patch for analysis? This would oupute log
>>> message when arping command fails.
>>> 
>>>> 
>>>> unpatched start:
>>>> 
>>>> pgpool[7189]: exec_ping: failed to ping 192.168.33.200
>>>> pgpool[7189]: wd_escalation: escalating to master pgpool
>>>> pgpool[7189]: wd_declare: send the packet to declare the new master
>>>> pgpool[7189]: wd_escalation: escalated to master pgpool successfully
>>>> 
>>>> unpatched stop:
>>>> 
>>>> pgpool[7198]: hb_receiver child receives shutdown request signal 2
>>>> pgpool[7199]: hb_sender child receives shutdown request signal 2
>>>> pgpool[7200]: exec_ping: failed to ping 192.168.33.200
>>>> pgpool[7200]: wd_IP_down: ifconfig down succeeded
>>>> pgpool[7189]: shmem_exit(0)
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> When I stop pgpool I get the following messages in pgpool.log:
>>>>>> 
>>>>>> pgpool[4163]: wd_IP_down: not delegate IP holder
>>>>>> pgpool[4161]: hb_receiver child receives shutdown request signal 2
>>>>>> pgpool[4162]: hb_sender child receives shutdown request signal 2
>>>>>> pgpool[4152]: shmem_exit(0)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Jan 29, 2014, at 6:42 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>>>>>> 
>>>>>>> On Tue, 28 Jan 2014 23:03:20 +0400
>>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi!
>>>>>>>> This patch applied successfully. But now a new problem. When I start pgpool service I get a new interface eth0:0 with failover IP address assigned as expected. But when I stop pgpool service eth0:0 won't go down. It remains even after complete shutdown of pgpool.
>>>>>>> 
>>>>>>> Odd, I can't reproduce this. Are there any error message?
>>>>>>> What ifconfig command do you use?
>>>>>>> 
>>>>>>>> 
>>>>>>>> I tried 3.3.2 without this patch and everything worked well. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jan 27, 2014, at 5:18 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>>>>>>>> 
>>>>>>>>> On Sat, 25 Jan 2014 15:31:44 +0400
>>>>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Jan 24, 2014, at 1:25 PM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>>>>>>>>>> 
>>>>>>>>>>> On Tue, 21 Jan 2014 15:24:02 +0400
>>>>>>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Great! Now it is working!
>>>>>>>>>>>> 
>>>>>>>>>>>> pgpool[31903]: wd_escalation: escalation command failed. exit status: 1
>>>>>>>>>>>> 
>>>>>>>>>>>> Thank you!
>>>>>>>>>>>> 
>>>>>>>>>>>> Will this patch be included in 3.3.3 ?
>>>>>>>>>>>> 
>>>>>>>>>>>> Also, what about failed if_up_cmd and further pgpool behaviour (my second message in the thread.) ?
>>>>>>>>>>> 
>>>>>>>>>>> I attached the patch. Could you try this? In this fix, pgpool outputs a error 
>>>>>>>>>>> message for if_up_cmd failure. This patch should be applied after the previous
>>>>>>>>>>> patch. This fix will be included in 3.3.3.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hi!
>>>>>>>>>> 
>>>>>>>>>> I tried to apply the patch against both 3.3.1 and 3.3.2
>>>>>>>>>> 
>>>>>>>>>> this is what I got:
>>>>>>>>> 
>>>>>>>>> Hmm.. Could you try the attached patch to 3.3.2? This includes allthe fix
>>>>>>>>> for escalation command and ifconfig errors.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/esc.patch
>>>>>>>>>> 
>>>>>>>>>> patching file src/watchdog/wd_packet.c
>>>>>>>>>> Hunk #1 succeeded at 954 (offset 23 lines).
>>>>>>>>>> 
>>>>>>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/ifup.patch
>>>>>>>>>> 
>>>>>>>>>> patching file src/watchdog/wd_if.c
>>>>>>>>>> Hunk #1 succeeded at 42 with fuzz 1 (offset 3 lines).
>>>>>>>>>> Hunk #2 succeeded at 62 (offset 3 lines).
>>>>>>>>>> Hunk #3 succeeded at 117 (offset 3 lines).
>>>>>>>>>> patching file src/watchdog/wd_packet.c
>>>>>>>>>> Hunk #1 succeeded at 654 (offset 23 lines).
>>>>>>>>>> Hunk #2 succeeded at 939 (offset 23 lines).
>>>>>>>>>> Hunk #3 FAILED at 932.
>>>>>>>>>> Hunk #4 succeeded at 976 (offset 18 lines).
>>>>>>>>>> 1 out of 4 hunks FAILED -- saving rejects to file src/watchdog/wd_packet.c.rej
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> src/watchdog/wd_packet.c.rej:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --- src/watchdog/wd_packet.c
>>>>>>>>>> +++ src/watchdog/wd_packet.c
>>>>>>>>>> @@ -932,22 +933,31 @@
>>>>>>>>>> 	/* execute escalation command */
>>>>>>>>>> 	if (strlen(pool_config->wd_escalation_command))
>>>>>>>>>> 	{
>>>>>>>>>> -		int r;
>>>>>>>>>> 		r = system(pool_config->wd_escalation_command);
>>>>>>>>>> 		if (WIFEXITED(r))
>>>>>>>>>> 		{
>>>>>>>>>> 			if (WEXITSTATUS(r) == EXIT_SUCCESS)
>>>>>>>>>> 				pool_log("wd_escalation: escalation command succeeded");
>>>>>>>>>> 			else
>>>>>>>>>> +			{
>>>>>>>>>> 				pool_error("wd_escalation: escalation command failed. exit status: %d", WEXITSTATUS(r));
>>>>>>>>>> +				has_error = true;
>>>>>>>>>> +			}
>>>>>>>>>> 		}
>>>>>>>>>> 		else
>>>>>>>>>> +		{
>>>>>>>>>> 			pool_error("wd_escalation: escalation command exit abnormally");
>>>>>>>>>> +			has_error = true;
>>>>>>>>>> +		}
>>>>>>>>>> 	}
>>>>>>>>>> 
>>>>>>>>>> 	/* interface up as delegate IP */
>>>>>>>>>> 	if (strlen(pool_config->delegate_IP) != 0)
>>>>>>>>>> -		wd_IP_up();
>>>>>>>>>> +	{
>>>>>>>>>> +		r = wd_IP_up();
>>>>>>>>>> +		if (r == WD_NG)
>>>>>>>>>> +			has_error = true;
>>>>>>>>>> +	}
>>>>>>>>>> 
>>>>>>>>>> 	/* set master status to the wd list */
>>>>>>>>>> 	wd_set_wd_list(pool_config->wd_hostname, pool_config->port,
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> In addition, I consider that pgpool shoud go to down status when if_up_cmd fails, 
>>>>>>>>>>> since this is worthless as a member of watchdog cluster. I'll make this fix for
>>>>>>>>>>> either 3.3.3 or 3.4.0.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Sounds reasonable. 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> Yugo Nagata <nagata at sraoss.co.jp>
>>>>>>>>> <escalation_error_all.patch>
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Yugo Nagata <nagata at sraoss.co.jp>
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Yugo Nagata <nagata at sraoss.co.jp>
>>>>> <escalation_error_all_for_analysis.patch>
>>>> 
>>> 
>>> 
>>> -- 
>>> Yugo Nagata <nagata at sraoss.co.jp>
>>> <escalation_error_all_for_analysis2.patch>
>> 
> 
> 
> -- 
> Yugo Nagata <nagata at sraoss.co.jp>



More information about the pgpool-general mailing list