[pgpool-general: 2535] Re: wd_escalation_command exit code
Yugo Nagata
nagata at sraoss.co.jp
Wed Feb 5 12:09:33 JST 2014
Hi,
On Tue, 4 Feb 2014 11:18:16 +0400
Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> Hi!
>
> Finally had a chance to apply the last patch.
Thanks!
>
> This is what I got in the log:
>
> pgpool[12464]: wd_chk_setuid all commands have setuid bit
> pgpool[12464]: watchdog might call network commands which using setuid bit.
> pgpool[12464]: Backend status file /var/log/postgresql/pgpool_status discarded
> pgpool[12464]: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> pgpool[12464]: exec_ping: failed to ping 192.168.33.200: exit code 1
> pgpool[12464]: wd_escalation: escalating to master pgpool
> pgpool[12464]: exec_ifconfig: 'ifconfig eth1:0 $_IP_$ netmask 255.255.255.0' succeeded
> pgpool[12464]: exec_ifconfig: 'arping -U $_IP_$ -w 1' failed. exit status: 1
The arping's exit status is not zero. It's odd, in my machine (CentOS 5/6 and Ubuntu)
this always returns zero.
Could you please execute arping while pgpool is working and VIP is up, and this exit code?
$ sudo /usr/sbin/arping -U 192.168.33.200 -w 1
$ echo $?
In addtion, please check version of arping.
$ /usr/sbin/arping -h
$ /usr/sbin/arping -V
> pgpool[12464]: wd_IP_up: ifconfig up failed
> pgpool[12464]: wd_declare: send the packet to declare the new master
> pgpool[12464]: wd_escalation: escalated to master pgpool with some errors
> pgpool[12464]: wd_init: start watchdog
>
>
>
>
> On Jan 31, 2014, at 7:16 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>
> > On Fri, 31 Jan 2014 00:50:37 +0400
> > Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >
> >>
> >> On Jan 30, 2014, at 8:40 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >>
> >>> Hi,
> >>>
> >>> On Wed, 29 Jan 2014 10:26:00 +0400
> >>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>>
> >>>> Hi!
> >>>>
> >>>> I'm testing this patch on a vagrant/virtualbox based VM.
> >>>>
> >>>> # uname -a
> >>>> Linux lb-node1 3.2.0-55-generic #85-Ubuntu SMP Wed Oct 2 12:29:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> >>>>
> >>>> # cat /etc/issue
> >>>> Ubuntu 12.04.3 LTS \n \l
> >>>>
> >>>> This is the output of ifconfig before starting pgpool:
> >>>>
> >>>> eth0 Link encap:Ethernet HWaddr 08:00:27:03:2b:89
> >>>> inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
> >>>> inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
> >>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> >>>> RX packets:7198 errors:0 dropped:0 overruns:0 frame:0
> >>>> TX packets:4853 errors:0 dropped:0 overruns:0 carrier:0
> >>>> collisions:0 txqueuelen:1000
> >>>> RX bytes:553607 (553.6 KB) TX bytes:722721 (722.7 KB)
> >>>>
> >>>> eth1 Link encap:Ethernet HWaddr 08:00:27:70:46:a0
> >>>> inet addr:192.168.33.11 Bcast:192.168.33.255 Mask:255.255.255.0
> >>>> inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
> >>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> >>>> RX packets:23682 errors:0 dropped:0 overruns:0 frame:0
> >>>> TX packets:4876 errors:0 dropped:0 overruns:0 carrier:0
> >>>> collisions:0 txqueuelen:1000
> >>>> RX bytes:2551344 (2.5 MB) TX bytes:646217 (646.2 KB)
> >>>>
> >>>> lo Link encap:Local Loopback
> >>>> inet addr:127.0.0.1 Mask:255.0.0.0
> >>>> inet6 addr: ::1/128 Scope:Host
> >>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
> >>>> RX packets:1636 errors:0 dropped:0 overruns:0 frame:0
> >>>> TX packets:1636 errors:0 dropped:0 overruns:0 carrier:0
> >>>> collisions:0 txqueuelen:0
> >>>> RX bytes:109868 (109.8 KB) TX bytes:109868 (109.8 KB)
> >>>>
> >>>>
> >>>> /etc/pgpool2/pgpool.conf:
> >>>> ...
> >>>> debug_level = 9
> >>>> …
> >>>> delegate_IP = '192.168.33.200'
> >>>> ...
> >>>> ifconfig_path = '/sbin'
> >>>> if_up_cmd = 'ifconfig eth1:0 $_IP_$ netmask 255.255.255.0'
> >>>> if_down_cmd = 'ifconfig eth1:0 down'
> >>>> ...
> >>>>
> >>>>
> >>>> Once I start pgpool I get the following ifconfig output
> >>>>
> >>>>
> >>>> eth0 Link encap:Ethernet HWaddr 08:00:27:03:2b:89
> >>>> inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
> >>>> inet6 addr: fe80::a00:27ff:fe03:2b89/64 Scope:Link
> >>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> >>>> RX packets:7939 errors:0 dropped:0 overruns:0 frame:0
> >>>> TX packets:5404 errors:0 dropped:0 overruns:0 carrier:0
> >>>> collisions:0 txqueuelen:1000
> >>>> RX bytes:606232 (606.2 KB) TX bytes:816924 (816.9 KB)
> >>>>
> >>>> eth1 Link encap:Ethernet HWaddr 08:00:27:70:46:a0
> >>>> inet addr:192.168.33.11 Bcast:192.168.33.255 Mask:255.255.255.0
> >>>> inet6 addr: fe80::a00:27ff:fe70:46a0/64 Scope:Link
> >>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> >>>> RX packets:25179 errors:0 dropped:0 overruns:0 frame:0
> >>>> TX packets:5204 errors:0 dropped:0 overruns:0 carrier:0
> >>>> collisions:0 txqueuelen:1000
> >>>> RX bytes:2704567 (2.7 MB) TX bytes:690834 (690.8 KB)
> >>>>
> >>>> eth1:0 Link encap:Ethernet HWaddr 08:00:27:70:46:a0
> >>>> inet addr:192.168.33.200 Bcast:192.168.33.255 Mask:255.255.255.0
> >>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> >>>>
> >>>> lo Link encap:Local Loopback
> >>>> inet addr:127.0.0.1 Mask:255.0.0.0
> >>>> inet6 addr: ::1/128 Scope:Host
> >>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
> >>>> RX packets:1745 errors:0 dropped:0 overruns:0 frame:0
> >>>> TX packets:1745 errors:0 dropped:0 overruns:0 carrier:0
> >>>> collisions:0 txqueuelen:0
> >>>> RX bytes:117264 (117.2 KB) TX bytes:117264 (117.2 KB)
> >>>>
> >>>>
> >>>>
> >>>> # ping 192.168.33.200
> >>>> PING 192.168.33.200 (192.168.33.200) 56(84) bytes of data.
> >>>> 64 bytes from 192.168.33.200: icmp_req=1 ttl=64 time=0.060 ms
> >>>> ^C
> >>>> --- 192.168.33.200 ping statistics ---
> >>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >>>> rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
> >>>>
> >>>>
> >>>> And these are some messages from pgpool.log:
> >>>>
> >>>> pgpool[4152]: wd_chk_setuid all commands have setuid bit
> >>>> pgpool[4152]: watchdog might call network commands which using setuid bit.
> >>>> pgpool[4152]: exec_ping: failed to ping 192.168.33.200
> >>>> pgpool[4152]: wd_escalation: escalating to master pgpool
> >>>> pgpool[4152]: wd_IP_up: ifconfig up failed
> >>>> pgpool[4152]: wd_declare: send the packet to declare the new master
> >>>> pgpool[4152]: wd_escalation: escalated to master pgpool with some errors
> >>>
> >>> That's funny. This says that "failed to ping" but VIP is brought up in fact.
> >>> It may take times between ifconfig and ping. However, pgpool should try to
> >>> ping up to three times before this succeeds, but this is tried only one time
> >>> in the case.
> >>>
> >>> For analysis, I would appreciate it if you would apply the attached patch and
> >>> send the log output messages.
> >>
> >> start:
> >>
> >> pgpool[2493]: num_backends: 2 total_weight: 2.000000
> >> pgpool[2493]: backend 0 weight: 1073741823.500000
> >> pgpool[2493]: backend 0 flag: 0000
> >> pgpool[2493]: backend 1 weight: 1073741823.500000
> >> pgpool[2493]: backend 1 flag: 0000
> >> pgpool[2493]: loading "/etc/pgpool2/pool_hba.conf" for client authentication configuration file
> >> pgpool[2493]: wd_chk_setuid all commands have setuid bit
> >> pgpool[2493]: watchdog might call network commands which using setuid bit.
> >> pgpool[2493]: Backend status file /var/log/postgresql/pgpool_status discarded
> >> pgpool[2493]: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> >> pgpool[2493]: send_packet_4_nodes: packet for lb-node2.site:9000 is canceled
> >> pgpool[2493]: exec_ping: failed to ping 192.168.33.200: exit code 1
> >> pgpool[2493]: wd_escalation: escalating to master pgpool
> >> pgpool[2493]: wd_IP_up: ifconfig up failed
> >> pgpool[2493]: wd_declare: send the packet to declare the new master
> >> pgpool[2493]: wd_escalation: escalated to master pgpool with some errors
> >> pgpool[2493]: wd_init: start watchdog
> >>
> >>
> >>
> >>
> >> stop:
> >>
> >> pgpool[2504]: wd_IP_down: not delegate IP holder
> >> pgpool[2502]: hb_receiver child receives shutdown request signal 2
> >> pgpool[2503]: hb_sender child receives shutdown request signal 2
> >> pgpool[2589]: child received shutdown request signal 2
> >> pgpool[2493]: shmem_exit(0)
> >>
> >>
> >>
> >> BTW, when I start/stop unpatched 3.3.2 version I see the same messages about ping failure. But everything works well in this case.
> >
> > I misunderstood that the ping failure occurs at bringing up VIP, but this is for
> > checking whether the VIP is already used by other host. So, this failure message
> > is no problem.
> >
> > The problem is, there is no ping message after "escalating to master pgpool".
> > "ifconfig up failed" may be caused by arping command's failure, since ping
> > should be executed after arping succeeded.
> >
> > Could you please try the next patch for analysis? This would oupute log
> > message when arping command fails.
> >
> >>
> >> unpatched start:
> >>
> >> pgpool[7189]: exec_ping: failed to ping 192.168.33.200
> >> pgpool[7189]: wd_escalation: escalating to master pgpool
> >> pgpool[7189]: wd_declare: send the packet to declare the new master
> >> pgpool[7189]: wd_escalation: escalated to master pgpool successfully
> >>
> >> unpatched stop:
> >>
> >> pgpool[7198]: hb_receiver child receives shutdown request signal 2
> >> pgpool[7199]: hb_sender child receives shutdown request signal 2
> >> pgpool[7200]: exec_ping: failed to ping 192.168.33.200
> >> pgpool[7200]: wd_IP_down: ifconfig down succeeded
> >> pgpool[7189]: shmem_exit(0)
> >>
> >>
> >>
> >>>
> >>>>
> >>>>
> >>>> When I stop pgpool I get the following messages in pgpool.log:
> >>>>
> >>>> pgpool[4163]: wd_IP_down: not delegate IP holder
> >>>> pgpool[4161]: hb_receiver child receives shutdown request signal 2
> >>>> pgpool[4162]: hb_sender child receives shutdown request signal 2
> >>>> pgpool[4152]: shmem_exit(0)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Jan 29, 2014, at 6:42 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >>>>
> >>>>> On Tue, 28 Jan 2014 23:03:20 +0400
> >>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>>>>
> >>>>>> Hi!
> >>>>>> This patch applied successfully. But now a new problem. When I start pgpool service I get a new interface eth0:0 with failover IP address assigned as expected. But when I stop pgpool service eth0:0 won't go down. It remains even after complete shutdown of pgpool.
> >>>>>
> >>>>> Odd, I can't reproduce this. Are there any error message?
> >>>>> What ifconfig command do you use?
> >>>>>
> >>>>>>
> >>>>>> I tried 3.3.2 without this patch and everything worked well.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Jan 27, 2014, at 5:18 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >>>>>>
> >>>>>>> On Sat, 25 Jan 2014 15:31:44 +0400
> >>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> On Jan 24, 2014, at 1:25 PM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> >>>>>>>>
> >>>>>>>>> On Tue, 21 Jan 2014 15:24:02 +0400
> >>>>>>>>> Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Great! Now it is working!
> >>>>>>>>>>
> >>>>>>>>>> pgpool[31903]: wd_escalation: escalation command failed. exit status: 1
> >>>>>>>>>>
> >>>>>>>>>> Thank you!
> >>>>>>>>>>
> >>>>>>>>>> Will this patch be included in 3.3.3 ?
> >>>>>>>>>>
> >>>>>>>>>> Also, what about failed if_up_cmd and further pgpool behaviour (my second message in the thread.) ?
> >>>>>>>>>
> >>>>>>>>> I attached the patch. Could you try this? In this fix, pgpool outputs a error
> >>>>>>>>> message for if_up_cmd failure. This patch should be applied after the previous
> >>>>>>>>> patch. This fix will be included in 3.3.3.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi!
> >>>>>>>>
> >>>>>>>> I tried to apply the patch against both 3.3.1 and 3.3.2
> >>>>>>>>
> >>>>>>>> this is what I got:
> >>>>>>>
> >>>>>>> Hmm.. Could you try the attached patch to 3.3.2? This includes allthe fix
> >>>>>>> for escalation command and ifconfig errors.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/esc.patch
> >>>>>>>>
> >>>>>>>> patching file src/watchdog/wd_packet.c
> >>>>>>>> Hunk #1 succeeded at 954 (offset 23 lines).
> >>>>>>>>
> >>>>>>>> node1:~/pgpool-orig# patch -p1 < /root/op/ifup.patch
> >>>>>>>>
> >>>>>>>> patching file src/watchdog/wd_if.c
> >>>>>>>> Hunk #1 succeeded at 42 with fuzz 1 (offset 3 lines).
> >>>>>>>> Hunk #2 succeeded at 62 (offset 3 lines).
> >>>>>>>> Hunk #3 succeeded at 117 (offset 3 lines).
> >>>>>>>> patching file src/watchdog/wd_packet.c
> >>>>>>>> Hunk #1 succeeded at 654 (offset 23 lines).
> >>>>>>>> Hunk #2 succeeded at 939 (offset 23 lines).
> >>>>>>>> Hunk #3 FAILED at 932.
> >>>>>>>> Hunk #4 succeeded at 976 (offset 18 lines).
> >>>>>>>> 1 out of 4 hunks FAILED -- saving rejects to file src/watchdog/wd_packet.c.rej
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> src/watchdog/wd_packet.c.rej:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --- src/watchdog/wd_packet.c
> >>>>>>>> +++ src/watchdog/wd_packet.c
> >>>>>>>> @@ -932,22 +933,31 @@
> >>>>>>>> /* execute escalation command */
> >>>>>>>> if (strlen(pool_config->wd_escalation_command))
> >>>>>>>> {
> >>>>>>>> - int r;
> >>>>>>>> r = system(pool_config->wd_escalation_command);
> >>>>>>>> if (WIFEXITED(r))
> >>>>>>>> {
> >>>>>>>> if (WEXITSTATUS(r) == EXIT_SUCCESS)
> >>>>>>>> pool_log("wd_escalation: escalation command succeeded");
> >>>>>>>> else
> >>>>>>>> + {
> >>>>>>>> pool_error("wd_escalation: escalation command failed. exit status: %d", WEXITSTATUS(r));
> >>>>>>>> + has_error = true;
> >>>>>>>> + }
> >>>>>>>> }
> >>>>>>>> else
> >>>>>>>> + {
> >>>>>>>> pool_error("wd_escalation: escalation command exit abnormally");
> >>>>>>>> + has_error = true;
> >>>>>>>> + }
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> /* interface up as delegate IP */
> >>>>>>>> if (strlen(pool_config->delegate_IP) != 0)
> >>>>>>>> - wd_IP_up();
> >>>>>>>> + {
> >>>>>>>> + r = wd_IP_up();
> >>>>>>>> + if (r == WD_NG)
> >>>>>>>> + has_error = true;
> >>>>>>>> + }
> >>>>>>>>
> >>>>>>>> /* set master status to the wd list */
> >>>>>>>> wd_set_wd_list(pool_config->wd_hostname, pool_config->port,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> In addition, I consider that pgpool shoud go to down status when if_up_cmd fails,
> >>>>>>>>> since this is worthless as a member of watchdog cluster. I'll make this fix for
> >>>>>>>>> either 3.3.3 or 3.4.0.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Sounds reasonable.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Yugo Nagata <nagata at sraoss.co.jp>
> >>>>>>> <escalation_error_all.patch>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Yugo Nagata <nagata at sraoss.co.jp>
> >>>>
> >>>
> >>>
> >>> --
> >>> Yugo Nagata <nagata at sraoss.co.jp>
> >>> <escalation_error_all_for_analysis.patch>
> >>
> >
> >
> > --
> > Yugo Nagata <nagata at sraoss.co.jp>
> > <escalation_error_all_for_analysis2.patch>
>
--
Yugo Nagata <nagata at sraoss.co.jp>
More information about the pgpool-general
mailing list