[pgpool-general: 2529] Re: wd_escalation_command exit code

Sergey Arlashin sergeyarl.maillist at gmail.com
Tue Feb 4 16:18:16 JST 2014


Hi!

Finally had a chance to apply the last patch.

This is what I got in the log:

pgpool[12464]: wd_chk_setuid all commands have setuid bit
pgpool[12464]: watchdog might call network commands which using setuid bit.
pgpool[12464]: Backend status file /var/log/postgresql/pgpool_status discarded
pgpool[12464]: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
pgpool[12464]: exec_ping: failed to ping 192.168.33.200: exit code 1
pgpool[12464]: wd_escalation: escalating to master pgpool
pgpool[12464]: exec_ifconfig: 'ifconfig eth1:0 $_IP_$ netmask 255.255.255.0' succeeded
pgpool[12464]: exec_ifconfig: 'arping -U $_IP_$ -w 1' failed. exit status: 1
pgpool[12464]: wd_IP_up: ifconfig up failed
pgpool[12464]: wd_declare: send the packet to declare the new master
pgpool[12464]: wd_escalation: escalated to master pgpool with some errors
pgpool[12464]: wd_init: start watchdog




On Jan 31, 2014, at 7:16 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:

> On Fri, 31 Jan 2014 00:50:37 +0400
> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> 
>> 
>> On Jan 30, 2014, at 8:40 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>> 
>>> Hi,
>>> 
>>> On Wed, 29 Jan 2014 10:26:00 +0400
>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>> 
>>>> Hi!
>>>> 
>>>> I'm testing this patch on a vagrant/virtualbox based VM. 
>>>> 
>>>> # uname -a
>>>> Linux lb-node1 3.2.0-55-generic #85-Ubuntu SMP Wed Oct 2 12:29:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>>>> 
>>>> # cat /etc/issue
>>>> Ubuntu 12.04.3 LTS \n \l
>>>> 
>>>> This is the output of ifconfig before starting pgpool:
>>>> 
>>>> eth0      Link encap:Ethernet  HWaddr 08:00:27:03:2b:89
>>>>         inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
>>>>         inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
>>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>         RX packets:7198 errors:0 dropped:0 overruns:0 frame:0
>>>>         TX packets:4853 errors:0 dropped:0 overruns:0 carrier:0
>>>>         collisions:0 txqueuelen:1000
>>>>         RX bytes:553607 (553.6 KB)  TX bytes:722721 (722.7 KB)
>>>> 
>>>> eth1      Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
>>>>         inet addr:192.168.33.11  Bcast:192.168.33.255  Mask:255.255.255.0
>>>>         inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
>>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>         RX packets:23682 errors:0 dropped:0 overruns:0 frame:0
>>>>         TX packets:4876 errors:0 dropped:0 overruns:0 carrier:0
>>>>         collisions:0 txqueuelen:1000
>>>>         RX bytes:2551344 (2.5 MB)  TX bytes:646217 (646.2 KB)
>>>> 
>>>> lo        Link encap:Local Loopback
>>>>         inet addr:127.0.0.1  Mask:255.0.0.0
>>>>         inet6 addr: ::1/128 Scope:Host
>>>>         UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>>>         RX packets:1636 errors:0 dropped:0 overruns:0 frame:0
>>>>         TX packets:1636 errors:0 dropped:0 overruns:0 carrier:0
>>>>         collisions:0 txqueuelen:0
>>>>         RX bytes:109868 (109.8 KB)  TX bytes:109868 (109.8 KB)
>>>> 
>>>> 
>>>> /etc/pgpool2/pgpool.conf:
>>>> ...
>>>> debug_level                   = 9
>>>>>>>> delegate_IP                   = '192.168.33.200'
>>>> ...
>>>> ifconfig_path                 = '/sbin'
>>>> if_up_cmd                     = 'ifconfig eth1:0 $_IP_$ netmask 255.255.255.0'
>>>> if_down_cmd                   = 'ifconfig eth1:0 down'
>>>> ...
>>>> 
>>>> 
>>>> Once I start pgpool I get the following ifconfig output
>>>> 
>>>> 
>>>> eth0      Link encap:Ethernet  HWaddr 08:00:27:03:2b:89
>>>>         inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
>>>>         inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
>>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>         RX packets:7939 errors:0 dropped:0 overruns:0 frame:0
>>>>         TX packets:5404 errors:0 dropped:0 overruns:0 carrier:0
>>>>         collisions:0 txqueuelen:1000
>>>>         RX bytes:606232 (606.2 KB)  TX bytes:816924 (816.9 KB)
>>>> 
>>>> eth1      Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
>>>>         inet addr:192.168.33.11  Bcast:192.168.33.255  Mask:255.255.255.0
>>>>         inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
>>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>         RX packets:25179 errors:0 dropped:0 overruns:0 frame:0
>>>>         TX packets:5204 errors:0 dropped:0 overruns:0 carrier:0
>>>>         collisions:0 txqueuelen:1000
>>>>         RX bytes:2704567 (2.7 MB)  TX bytes:690834 (690.8 KB)
>>>> 
>>>> eth1:0    Link encap:Ethernet  HWaddr 08:00:27:70:46:a0
>>>>         inet addr:192.168.33.200  Bcast:192.168.33.255  Mask:255.255.255.0
>>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>> 
>>>> lo        Link encap:Local Loopback
>>>>         inet addr:127.0.0.1  Mask:255.0.0.0
>>>>         inet6 addr: ::1/128 Scope:Host
>>>>         UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>>>         RX packets:1745 errors:0 dropped:0 overruns:0 frame:0
>>>>         TX packets:1745 errors:0 dropped:0 overruns:0 carrier:0
>>>>         collisions:0 txqueuelen:0
>>>>         RX bytes:117264 (117.2 KB)  TX bytes:117264 (117.2 KB)
>>>> 
>>>> 
>>>> 
>>>> # ping 192.168.33.200
>>>> PING 192.168.33.200 (192.168.33.200) 56(84) bytes of data.
>>>> 64 bytes from 192.168.33.200: icmp_req=1 ttl=64 time=0.060 ms
>>>> ^C
>>>> --- 192.168.33.200 ping statistics ---
>>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>>> rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
>>>> 
>>>> 
>>>> And these are some messages from pgpool.log:
>>>> 
>>>> pgpool[4152]: wd_chk_setuid all commands have setuid bit
>>>> pgpool[4152]: watchdog might call network commands which using setuid bit.
>>>> pgpool[4152]: exec_ping: failed to ping 192.168.33.200
>>>> pgpool[4152]: wd_escalation: escalating to master pgpool
>>>> pgpool[4152]: wd_IP_up: ifconfig up failed
>>>> pgpool[4152]: wd_declare: send the packet to declare the new master
>>>> pgpool[4152]: wd_escalation: escalated to master pgpool with some errors
>>> 
>>> That's funny. This says that "failed to ping" but VIP is brought up in fact. 
>>> It may take times between ifconfig and ping. However, pgpool should try to 
>>> ping up to three times before this succeeds, but this is tried only one time
>>> in the case.
>>> 
>>> For analysis, I would appreciate it if you would apply the attached patch and
>>> send the log output messages.
>> 
>> start:
>> 
>> pgpool[2493]: num_backends: 2 total_weight: 2.000000
>> pgpool[2493]: backend 0 weight: 1073741823.500000
>> pgpool[2493]: backend 0 flag: 0000
>> pgpool[2493]: backend 1 weight: 1073741823.500000
>> pgpool[2493]: backend 1 flag: 0000
>> pgpool[2493]: loading "/etc/pgpool2/pool_hba.conf" for client authentication configuration file
>> pgpool[2493]: wd_chk_setuid all commands have setuid bit
>> pgpool[2493]: watchdog might call network commands which using setuid bit.
>> pgpool[2493]: Backend status file /var/log/postgresql/pgpool_status discarded
>> pgpool[2493]: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
>> pgpool[2493]: send_packet_4_nodes: packet for lb-node2.site:9000 is canceled
>> pgpool[2493]: exec_ping: failed to ping 192.168.33.200: exit code 1
>> pgpool[2493]: wd_escalation: escalating to master pgpool
>> pgpool[2493]: wd_IP_up: ifconfig up failed
>> pgpool[2493]: wd_declare: send the packet to declare the new master
>> pgpool[2493]: wd_escalation: escalated to master pgpool with some errors
>> pgpool[2493]: wd_init: start watchdog
>> 
>> 
>> 
>> 
>> stop:
>> 
>> pgpool[2504]: wd_IP_down: not delegate IP holder
>> pgpool[2502]: hb_receiver child receives shutdown request signal 2
>> pgpool[2503]: hb_sender child receives shutdown request signal 2
>> pgpool[2589]: child received shutdown request signal 2
>> pgpool[2493]: shmem_exit(0)
>> 
>> 
>> 
>> BTW, when I start/stop unpatched 3.3.2 version I see the same messages about ping failure. But everything works well in this case. 
> 
> I misunderstood that the ping failure occurs at bringing up VIP, but this is for
> checking whether the VIP is already used by other host. So, this failure message
> is no problem.
> 
> The problem is, there is no ping message after "escalating to master pgpool".
> "ifconfig up failed" may be caused by arping command's failure, since ping
> should be executed after arping succeeded.
> 
> Could you please try the next patch for analysis? This would oupute log
> message when arping command fails.
> 
>> 
>> unpatched start:
>> 
>> pgpool[7189]: exec_ping: failed to ping 192.168.33.200
>> pgpool[7189]: wd_escalation: escalating to master pgpool
>> pgpool[7189]: wd_declare: send the packet to declare the new master
>> pgpool[7189]: wd_escalation: escalated to master pgpool successfully
>> 
>> unpatched stop:
>> 
>> pgpool[7198]: hb_receiver child receives shutdown request signal 2
>> pgpool[7199]: hb_sender child receives shutdown request signal 2
>> pgpool[7200]: exec_ping: failed to ping 192.168.33.200
>> pgpool[7200]: wd_IP_down: ifconfig down succeeded
>> pgpool[7189]: shmem_exit(0)
>> 
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> When I stop pgpool I get the following messages in pgpool.log:
>>>> 
>>>> pgpool[4163]: wd_IP_down: not delegate IP holder
>>>> pgpool[4161]: hb_receiver child receives shutdown request signal 2
>>>> pgpool[4162]: hb_sender child receives shutdown request signal 2
>>>> pgpool[4152]: shmem_exit(0)
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jan 29, 2014, at 6:42 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>>>> 
>>>>> On Tue, 28 Jan 2014 23:03:20 +0400
>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>>>> 
>>>>>> Hi!
>>>>>> This patch applied successfully. But now a new problem. When I start pgpool service I get a new interface eth0:0 with failover IP address assigned as expected. But when I stop pgpool service eth0:0 won't go down. It remains even after complete shutdown of pgpool.
>>>>> 
>>>>> Odd, I can't reproduce this. Are there any error message?
>>>>> What ifconfig command do you use?
>>>>> 
>>>>>> 
>>>>>> I tried 3.3.2 without this patch and everything worked well. 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Jan 27, 2014, at 5:18 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>>>>>> 
>>>>>>> On Sat, 25 Jan 2014 15:31:44 +0400
>>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jan 24, 2014, at 1:25 PM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>>>>>>>> 
>>>>>>>>> On Tue, 21 Jan 2014 15:24:02 +0400
>>>>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Great! Now it is working!
>>>>>>>>>> 
>>>>>>>>>> pgpool[31903]: wd_escalation: escalation command failed. exit status: 1
>>>>>>>>>> 
>>>>>>>>>> Thank you!
>>>>>>>>>> 
>>>>>>>>>> Will this patch be included in 3.3.3 ?
>>>>>>>>>> 
>>>>>>>>>> Also, what about failed if_up_cmd and further pgpool behaviour (my second message in the thread.) ?
>>>>>>>>> 
>>>>>>>>> I attached the patch. Could you try this? In this fix, pgpool outputs a error 
>>>>>>>>> message for if_up_cmd failure. This patch should be applied after the previous
>>>>>>>>> patch. This fix will be included in 3.3.3.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi!
>>>>>>>> 
>>>>>>>> I tried to apply the patch against both 3.3.1 and 3.3.2
>>>>>>>> 
>>>>>>>> this is what I got:
>>>>>>> 
>>>>>>> Hmm.. Could you try the attached patch to 3.3.2? This includes allthe fix
>>>>>>> for escalation command and ifconfig errors.
>>>>>>> 
>>>>>>>> 
>>>>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/esc.patch
>>>>>>>> 
>>>>>>>> patching file src/watchdog/wd_packet.c
>>>>>>>> Hunk #1 succeeded at 954 (offset 23 lines).
>>>>>>>> 
>>>>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/ifup.patch
>>>>>>>> 
>>>>>>>> patching file src/watchdog/wd_if.c
>>>>>>>> Hunk #1 succeeded at 42 with fuzz 1 (offset 3 lines).
>>>>>>>> Hunk #2 succeeded at 62 (offset 3 lines).
>>>>>>>> Hunk #3 succeeded at 117 (offset 3 lines).
>>>>>>>> patching file src/watchdog/wd_packet.c
>>>>>>>> Hunk #1 succeeded at 654 (offset 23 lines).
>>>>>>>> Hunk #2 succeeded at 939 (offset 23 lines).
>>>>>>>> Hunk #3 FAILED at 932.
>>>>>>>> Hunk #4 succeeded at 976 (offset 18 lines).
>>>>>>>> 1 out of 4 hunks FAILED -- saving rejects to file src/watchdog/wd_packet.c.rej
>>>>>>>> 
>>>>>>>> 
>>>>>>>> src/watchdog/wd_packet.c.rej:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --- src/watchdog/wd_packet.c
>>>>>>>> +++ src/watchdog/wd_packet.c
>>>>>>>> @@ -932,22 +933,31 @@
>>>>>>>> 	/* execute escalation command */
>>>>>>>> 	if (strlen(pool_config->wd_escalation_command))
>>>>>>>> 	{
>>>>>>>> -		int r;
>>>>>>>> 		r = system(pool_config->wd_escalation_command);
>>>>>>>> 		if (WIFEXITED(r))
>>>>>>>> 		{
>>>>>>>> 			if (WEXITSTATUS(r) == EXIT_SUCCESS)
>>>>>>>> 				pool_log("wd_escalation: escalation command succeeded");
>>>>>>>> 			else
>>>>>>>> +			{
>>>>>>>> 				pool_error("wd_escalation: escalation command failed. exit status: %d", WEXITSTATUS(r));
>>>>>>>> +				has_error = true;
>>>>>>>> +			}
>>>>>>>> 		}
>>>>>>>> 		else
>>>>>>>> +		{
>>>>>>>> 			pool_error("wd_escalation: escalation command exit abnormally");
>>>>>>>> +			has_error = true;
>>>>>>>> +		}
>>>>>>>> 	}
>>>>>>>> 
>>>>>>>> 	/* interface up as delegate IP */
>>>>>>>> 	if (strlen(pool_config->delegate_IP) != 0)
>>>>>>>> -		wd_IP_up();
>>>>>>>> +	{
>>>>>>>> +		r = wd_IP_up();
>>>>>>>> +		if (r == WD_NG)
>>>>>>>> +			has_error = true;
>>>>>>>> +	}
>>>>>>>> 
>>>>>>>> 	/* set master status to the wd list */
>>>>>>>> 	wd_set_wd_list(pool_config->wd_hostname, pool_config->port,
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> In addition, I consider that pgpool shoud go to down status when if_up_cmd fails, 
>>>>>>>>> since this is worthless as a member of watchdog cluster. I'll make this fix for
>>>>>>>>> either 3.3.3 or 3.4.0.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Sounds reasonable. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Yugo Nagata <nagata at sraoss.co.jp>
>>>>>>> <escalation_error_all.patch>
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Yugo Nagata <nagata at sraoss.co.jp>
>>>> 
>>> 
>>> 
>>> -- 
>>> Yugo Nagata <nagata at sraoss.co.jp>
>>> <escalation_error_all_for_analysis.patch>
>> 
> 
> 
> -- 
> Yugo Nagata <nagata at sraoss.co.jp>
> <escalation_error_all_for_analysis2.patch>



More information about the pgpool-general mailing list