[pgpool-general: 1950] Re: pgpool 3.2.5 watchdog ifconfig down always hangs

Jeff Frost jeff at pgexperts.com
Sat Jul 27 04:50:04 JST 2013


A quick rebuild and here we go:

jeff at squeeze:/usr/local/pgpool2$ ps -ef|grep pgpool
postgres 13073     1  0 12:35 pts/1    00:00:00 logger -t pgpool -p local0.info
postgres 13098     1  0 12:35 pts/1    00:00:00 pgpool: watchdog               
postgres 13099     1  0 12:35 pts/1    00:00:00 pgpool: lifecheck              

sudo gdb -p 13098

(gdb) bt
#0  0x00007f492036b3e3 in select () from /lib/libc.so.6
#1  0x0000000000478d98 in wd_accept (sock=<value optimized out>) at
wd_packet.c:302
#2  0x0000000000477137 in wd_child (fork_wait_time=1) at wd_child.c:91
#3  0x0000000000476db5 in wd_main (fork_wait_time=1) at watchdog.c:127
#4  0x00000000004086ec in main (argc=<value optimized out>, argv=<value
optimized out>) at main.c:632

sudo gdb -p 13098

(gdb) bt
#0  0x00007f4920ef9b33 in wait () from /lib/libpthread.so.0
#1  0x00000000004776bc in exec_ifconfig (path=0x7fffde882fe0 "/usr/bin/sudo",
command=<value optimized out>) at wd_if.c:191
#2  0x0000000000477843 in wd_IP_down () at wd_if.c:79
#3  0x0000000000479699 in wd_notice_server_down () at wd_packet.c:119
#4  0x0000000000476f30 in wd_exit (exit_signo=2) at watchdog.c:75
#5  <signal handler called>
#6  0x00007f4920342c5d in nanosleep () from /lib/libc.so.6
#7  0x00007f4920342ad0 in sleep () from /lib/libc.so.6
#8  0x0000000000476e87 in wd_main (fork_wait_time=1) at watchdog.c:160
#9  0x00000000004086ec in main (argc=<value optimized out>, argv=<value
optimized out>) at main.c:632

jeff at squeeze:/usr/local/pgpool2$ ps -ef|grep ifconfig
root      2220  2121  0 12:45 pts/1    00:00:00 sudo ifconfig eth0:1
10.10.10.28 netmask 255.255.255.0 down
root      2221  2220  0 12:45 pts/1    00:00:00 [ifconfig] <defunct>

jeff at squeeze:/usr/local/pgpool2$ sudo gdb -p 2220

(gdb) bt
#0  0x00007f1b412563c3 in select () from /lib/libc.so.6
#1  0x0000000000409a23 in sudo_execve ()
#2  0x000000000040e463 in run_command ()
#3  0x000000000040fce0 in main ()

jeff at squeeze:/usr/local/pgpool2$ sudo gdb -p 2221

Attaching to process 2221
ptrace: Operation not permitted.
(gdb) bt
No stack.

An important note is that the ifconfig down is actually successful even though
it doesn't return - that is, the eth0:1 interface goes away.

So it looks like this:

Jul 26 12:45:05 squeeze pgpool: 2013-07-26 12:45:05 ERROR: pid 2094:
find_primary_node: make_persistent_connection failed
Jul 26 12:45:05 squeeze pgpool: 2013-07-26 12:45:05 LOG:   pid 2094: received
fast shutdown request
Jul 26 12:45:05 squeeze pgpool: 2013-07-26 12:45:05 LOG:   pid 2094:
watchdog_pid: 2121
Jul 26 12:45:25 squeeze pgpool: 2013-07-26 12:45:25 ERROR: pid 2094: wait()
failed. reason:Interrupted system call

Till I kill -9 the sudo process:

jeff at squeeze:/usr/local/pgpool2$ sudo kill -9 2220

Then these two log lines are emitted:

Jul 26 12:48:59 squeeze pgpool: 2013-07-26 12:48:59 ERROR: pid 2121:
wd_IP_down: ifconfig down failed
Jul 26 12:49:02 squeeze pgpool: 2013-07-26 12:49:02 LOG:   pid 2121:
wd_create_send_socket: connect() reports failure (No route to host). You can
safely ignore this while starting up.



On 07/26/13 12:30, Jeff Frost wrote:
> More info:
>
> Here is a syslog snippet:
>
> |Jul 26 12:06:33 pgpool01 pgpool: 2013-07-26 12:06:33 LOG:   pid 12847: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> Jul 26 12:06:49 pgpool01 pgpool: 2013-07-26 12:06:49 LOG:   pid 13243: wd_chk_sticy: all commands have sticky bit
> Jul 26 12:06:49 pgpool01 pgpool: 2013-07-26 12:06:49 LOG:   pid 13243: watchdog might call network commands which using sticky bit.
> Jul 26 12:06:49 pgpool01 pgpool: 2013-07-26 12:06:49 LOG:   pid 13243: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> Jul 26 12:06:52 pgpool01 pgpool: 2013-07-26 12:06:52 LOG:   pid 13243: wd_escalation: escalated to master pgpool
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG:   pid 13243: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG:   pid 13243: wd_escalation:  escaleted to delegate_IP holder
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG:   pid 13243: wd_init: start watchdog
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG:   pid 13243: pgpool-II successfully started. version 3.2.4 (namameboshi)
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG:   pid 13243: find_primary_node: primary node id is 0
> Jul 26 12:10:13 pgpool01 pgpool: 2013-07-26 12:10:13 LOG:   pid 13243: received fast shutdown request
> Jul 26 12:10:13 pgpool01 pgpool: 2013-07-26 12:10:13 LOG:   pid 13243: watchdog_pid: 13257
> *Jul 26 12:10:53 pgpool01 pgpool: 2013-07-26 12:10:53 ERROR: pid 13257: wd_IP_down: ifconfig down failed*
> Jul 26 12:10:53 pgpool01 pgpool: 2013-07-26 12:10:53 LOG:   pid 13257: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> Jul 26 12:11:19 pgpool01 pgpool: 2013-07-26 12:11:19 LOG:   pid 13613: wd_chk_sticy: all commands have sticky bit|
>
> I also attached 2 straces - one was the pgpool process as I issued a stop,
> and the other was the watchdog process.
>
> Unfortunately, it doesn't look like apt.postgresql.org provides dbg files,
> so I'll have to rebuild pgpool2 so I can get the symbols for the backtraces.
>
>
> On 07/26/13 07:48, Jeff Frost wrote:
>> Yes, you can see the pgpool processes stuck in my ps output below.
>>
>> They happily exit once I kill -9 the sudo process.
>>
>> I'll see if I can get some stack traces but if you can't reproduce on Ubuntu or CentOS, I suspect it's something with Debian Squeeze's sudo or ifconfig commands.
>>
>> On Jul 26, 2013, at 3:24 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>>
>>> Hi,
>>>
>>> Does pgpool hang as well as ifconfig when it is stopped?
>>> I cannot reproduce this on CentOS and Ubuntu. Both pgpool and 
>>> ifconfig stops normally.
>>>
>>> Could you please provide me the stack trace of hanging pgpool and
>>> log msessages?
>>>
>>>
>>> On Thu, 25 Jul 2013 09:56:36 -0700
>>> Jeff Frost <jeff at pgexperts.com> wrote:
>>>
>>>> This seems to be the same on 3.2.3, 3.2.4 and 3.2.5.
>>>>
>>>> The watchdog section of pgpool.conf looks like this:
>>>>
>>>> use_watchdog = on
>>>> delegate_IP = '10.100.2.72'
>>>> wd_hostname = '10.100.2.70'
>>>> wd_port = 9000
>>>> ifconfig_path = '/usr/bin'
>>>> arping_path = '/usr/bin'
>>>> if_up_cmd = 'sudo ifconfig eth0:1 $_IP_$ netmask 255.255.255.0 up'
>>>> if_down_cmd = 'sudo ifconfig eth0:1 $_IP_$ netmask 255.255.255.0 down'
>>>> arping_cmd = 'sudo arping -U $_IP_$ -w 1'
>>>> wd_interval = 3
>>>> other_pgpool_hostname0 = '10.100.2.71'
>>>> other_pgpool_port0 = 9999
>>>> other_wd_port0 = 9000
>>>>
>>>> virtual IP starts up great and properly moves to the secondary pgpool server
>>>> if you stop pgpool.  However, the ifconfig becomes defunct and never exits
>>>> requiring a kill -9:
>>>>
>>>> jeff at pgpool01:/tmp/pgpool$ ps -ef|grep pgpool
>>>> postgres 19974     1  0 09:51 pts/0    00:00:00 /tmp/pgpool/bin/pgpool -n
>>>> postgres 19975     1  0 09:51 pts/0    00:00:00 logger -t pgpool -p local0.info
>>>> postgres 19978 19974  0 09:51 pts/0    00:00:00 pgpool: watchdog        
>>>> postgres 19979 19974  0 09:51 pts/0    00:00:00 pgpool: lifecheck       
>>>> jeff     20735  1615  0 09:54 pts/0    00:00:00 grep pgpool
>>>>
>>>> jeff at pgpool01:/tmp/pgpool$ ps -ef|grep ifconfig
>>>> root     20439 19979  0 09:52 pts/0    00:00:00 sudo ifconfig eth0:1
>>>> 10.100.2.72 netmask 255.255.255.0 down
>>>> root     20440 20439  0 09:52 pts/0    00:00:00 [ifconfig] <defunct>
>>>> jeff     20737  1615  0 09:54 pts/0    00:00:00 grep ifconfig
>>>>
>>>> System is Debian Squeeze.  Any idea how to fix this?  kill -9 of the sudo
>>>> allows pgpool to exit.
>>>>
>>>> -- 
>>>> Jeff Frost <jeff at pgexperts.com>
>>>> CTO, PostgreSQL Experts, Inc.
>>>> Phone: 1-888-PG-EXPRT x506
>>>> FAX: 415-762-5122
>>>> http://www.pgexperts.com/ 
>>>>
>>>> _______________________________________________
>>>> pgpool-general mailing list
>>>> pgpool-general at pgpool.net
>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> -- 
>>> Yugo Nagata <nagata at sraoss.co.jp>
>> ---
>> Jeff Frost <jeff at pgexperts.com>
>> CTO, PostgreSQL Experts, Inc.
>> Phone: 1-888-PG-EXPRT x506
>> FAX: 415-762-5122
>> http://www.pgexperts.com/ 
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
>
> -- 
> Jeff Frost <jeff at pgexperts.com>
> CTO, PostgreSQL Experts, Inc.
> Phone: 1-888-PG-EXPRT x506
> FAX: 415-762-5122
> http://www.pgexperts.com/ 
>
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general


-- 
Jeff Frost <jeff at pgexperts.com>
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130726/aaaec8e0/attachment.html>


More information about the pgpool-general mailing list