[Pgpool-general] seemingly hung pgpool process consuming 100% CPU

Tatsuo Ishii ishii at sraoss.co.jp
Thu Nov 24 06:42:44 UTC 2011


> Not wanting to be impatient, but I'm very concerned about this
> problem, since its impossible to predict when it will occur.  Is there
> additional information that I can provide to investigate this further?

I really need to know where pgpool is looping.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> thanks
> 
> On Tue, Nov 22, 2011 at 10:11 AM, Lonni J Friedman <netllama at gmail.com> wrote:
>> This hadn't reproduced in a long time, but we ugpraded to pgpool-3.1 a
>> week ago, and this morning I found a pgpool process that was consuming
>> 100% CPU, and had been running for a week (although wasn't consuming
>> 100% CPU the entire time).  Something else weird is that it showed an
>> active, idle connection from a client system which had only been up
>> for the past 21 hours.  Anyway, here's the backtrace from the process
>> (gdb hung at the very bottom):
>>
>> [root ~]# gdb pgpool 31293
>> GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6)
>> Copyright (C) 2010 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/sbin/pgpool...Reading symbols from
>> /usr/lib/debug/usr/sbin/pgpool.debug...done.
>> done.
>> Attaching to program: /usr/sbin/pgpool, process 31293
>> Reading symbols from /usr/lib64/libpq.so.5...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib64/libpq.so.5
>> Reading symbols from /usr/lib64/libpcp.so.0...Reading symbols from
>> /usr/lib/debug/usr/lib64/libpcp.so.0.0.0.debug...done.
>> done.
>> Loaded symbols for /usr/lib64/libpcp.so.0
>> Reading symbols from /lib64/libpam.so.0...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libpam.so.0
>> Reading symbols from /usr/lib64/libssl.so.10...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib64/libssl.so.10
>> Reading symbols from /usr/lib64/libcrypto.so.10...(no debugging
>> symbols found)...done.
>> Loaded symbols for /usr/lib64/libcrypto.so.10
>> Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libcrypt.so.1
>> Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libresolv.so.2
>> Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libnsl.so.1
>> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libm.so.6
>> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libc.so.6
>> Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging
>> symbols found)...done.
>> Loaded symbols for /lib64/libgssapi_krb5.so.2
>> Reading symbols from /usr/lib64/libldap_r-2.4.so.2...(no debugging
>> symbols found)...done.
>> Loaded symbols for /usr/lib64/libldap_r-2.4.so.2
>> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
>> found)...done.
>> [Thread debugging using libthread_db enabled]
>> Loaded symbols for /lib64/libpthread.so.0
>> Reading symbols from /lib64/libaudit.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libaudit.so.1
>> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libdl.so.2
>> Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libkrb5.so.3
>> Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libcom_err.so.2
>> Reading symbols from /lib64/libk5crypto.so.3...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libk5crypto.so.3
>> Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libz.so.1
>> Reading symbols from /usr/lib64/libfreebl3.so...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib64/libfreebl3.so
>> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
>> symbols found)...done.
>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>> Reading symbols from /lib64/libkrb5support.so.0...(no debugging
>> symbols found)...done.
>> Loaded symbols for /lib64/libkrb5support.so.0
>> Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libkeyutils.so.1
>> Reading symbols from /usr/lib64/liblber-2.4.so.2...(no debugging
>> symbols found)...done.
>> Loaded symbols for /usr/lib64/liblber-2.4.so.2
>> Reading symbols from /usr/lib64/libsasl2.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib64/libsasl2.so.2
>> Reading symbols from /lib64/libselinux.so.1...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libselinux.so.1
>> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libnss_files.so.2
>> 0x00000000004192c0 in pool_process_query (frontend=0x2dd8fd0,
>> backend=0x187d540,
>>    reset_request=<value optimized out>) at pool_process_query.c:379
>> 379                             if (!pool_read_buffer_is_empty(MASTER(backend)) ||
>> pool_is_query_in_progress())
>> Missing separate debuginfos, use: debuginfo-install
>> audit-libs-2.0.4-1.el6.x86_64 cyrus-sasl-lib-2.1.23-8.el6.x86_64
>> glibc-2.12-1.7.el6.x86_64 keyutils-libs-1.4-1.el6.x86_64
>> krb5-libs-1.8.2-3.el6.x86_64 libcom_err-1.41.12-3.el6.x86_64
>> libselinux-2.0.94-2.el6.x86_64
>> nss-softokn-freebl-3.12.7-1.1.el6.x86_64 openldap-2.4.19-15.el6.x86_64
>> openssl-1.0.0-4.el6.x86_64 pam-1.1.1-4.el6.x86_64
>> postgresql-libs-8.4.4-2.el6.x86_64 zlib-1.2.3-25.el6.x86_64
>> (gdb) bt
>> #0  0x00000000004192c0 in pool_process_query (frontend=0x2dd8fd0,
>> backend=0x187d540,
>>    reset_request=<value optimized out>) at pool_process_query.c:379
>> #1  0x000000000040ae42 in do_child (unix_fd=3, inet_fd=<value
>> optimized out>) at child.c:354
>> #2  0x00000000004054c5 in fork_a_child (unix_fd=3, inet_fd=4, id=152)
>> at main.c:1072
>> #3  0x0000000000407b1c in main (argc=<value optimized out>,
>> argv=<value optimized out>)
>>    at main.c:549
>> (gdb) cont
>> Continuing.
>>
>> On Tue, Sep 20, 2011 at 9:25 AM, Lonni J Friedman <netllama at gmail.com> wrote:
>>> Nevermind, I figured out what I was doing wrong.  Now I just need for
>>> this to hang again.
>>>
>>> On Tue, Sep 20, 2011 at 9:08 AM, Lonni J Friedman <netllama at gmail.com> wrote:
>>>> I tried to do that, but pgpool refuses to start reporting:
>>>> -bash: /usr/lib/debug/usr/sbin/pgpool.debug: bad ELF interpreter: No
>>>> such file or directory
>>>>
>>>> I'm puzzled why it fails, as it was built on the same server where I
>>>> built the (working) release build of pgpool.
>>>>
>>>> $ file /usr/lib/debug/usr/sbin/pgpool.debug
>>>> /usr/lib/debug/usr/sbin/pgpool.debug: ELF 64-bit LSB executable,
>>>> x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs),
>>>> for GNU/Linux 2.6.18, not stripped
>>>> $ file /usr/sbin/pgpool
>>>> /usr/sbin/pgpool: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
>>>> dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Sep 19, 2011 at 7:10 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>> It's really hard to find the cause of the problem from a stack trace
>>>>> without symbol tables... Is it possible to reinstalll pgpool binary
>>>>> with debug symbols?
>>>>> --
>>>>> Tatsuo Ishii
>>>>> SRA OSS, Inc. Japan
>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>> Japanese: http://www.sraoss.co.jp
>>>>>
>>>>>> This happened again.  I ran the gdb command that you requested,
>>>>>> however it occurred to me that the output may not be all that useful
>>>>>> since I'm not running a debug build of pgpool:
>>>>>> ###############
>>>>>> # gdb pgpool 2343
>>>>>> GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6)
>>>>>> Copyright (C) 2010 Free Software Foundation, Inc.
>>>>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>>>>>> This is free software: you are free to change and redistribute it.
>>>>>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>>>>>> and "show warranty" for details.
>>>>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>>>>> For bug reporting instructions, please see:
>>>>>> <http://www.gnu.org/software/gdb/bugs/>...
>>>>>> Reading symbols from /usr/sbin/pgpool...(no debugging symbols found)...done.
>>>>>> Attaching to program: /usr/sbin/pgpool, process 2343
>>>>>> Reading symbols from /usr/lib64/libpq.so.5...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /usr/lib64/libpq.so.5
>>>>>> Reading symbols from /usr/lib64/libpcp.so.0...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /usr/lib64/libpcp.so.0
>>>>>> Reading symbols from /lib64/libpam.so.0...(no debugging symbols found)...done.
>>>>>> Loaded symbols for /lib64/libpam.so.0
>>>>>> Reading symbols from /usr/lib64/libssl.so.10...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /usr/lib64/libssl.so.10
>>>>>> Reading symbols from /usr/lib64/libcrypto.so.10...(no debugging
>>>>>> symbols found)...done.
>>>>>> Loaded symbols for /usr/lib64/libcrypto.so.10
>>>>>> Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
>>>>>> Loaded symbols for /lib64/libcrypt.so.1
>>>>>> Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /lib64/libresolv.so.2
>>>>>> Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
>>>>>> Loaded symbols for /lib64/libnsl.so.1
>>>>>> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
>>>>>> Loaded symbols for /lib64/libm.so.6
>>>>>> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
>>>>>> Loaded symbols for /lib64/libc.so.6
>>>>>> Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging
>>>>>> symbols found)...done.
>>>>>> Loaded symbols for /lib64/libgssapi_krb5.so.2
>>>>>> Reading symbols from /usr/lib64/libldap_r-2.4.so.2...(no debugging
>>>>>> symbols found)...done.
>>>>>> Loaded symbols for /usr/lib64/libldap_r-2.4.so.2
>>>>>> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
>>>>>> found)...done.
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> Loaded symbols for /lib64/libpthread.so.0
>>>>>> Reading symbols from /lib64/libaudit.so.1...(no debugging symbols found)...done.
>>>>>> Loaded symbols for /lib64/libaudit.so.1
>>>>>> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
>>>>>> Loaded symbols for /lib64/libdl.so.2
>>>>>> Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols found)...done.
>>>>>> Loaded symbols for /lib64/libkrb5.so.3
>>>>>> Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /lib64/libcom_err.so.2
>>>>>> Reading symbols from /lib64/libk5crypto.so.3...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /lib64/libk5crypto.so.3
>>>>>> Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
>>>>>> Loaded symbols for /lib64/libz.so.1
>>>>>> Reading symbols from /usr/lib64/libfreebl3.so...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /usr/lib64/libfreebl3.so
>>>>>> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
>>>>>> symbols found)...done.
>>>>>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>>>>>> Reading symbols from /lib64/libkrb5support.so.0...(no debugging
>>>>>> symbols found)...done.
>>>>>> Loaded symbols for /lib64/libkrb5support.so.0
>>>>>> Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /lib64/libkeyutils.so.1
>>>>>> Reading symbols from /usr/lib64/liblber-2.4.so.2...(no debugging
>>>>>> symbols found)...done.
>>>>>> Loaded symbols for /usr/lib64/liblber-2.4.so.2
>>>>>> Reading symbols from /usr/lib64/libsasl2.so.2...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /usr/lib64/libsasl2.so.2
>>>>>> Reading symbols from /lib64/libselinux.so.1...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /lib64/libselinux.so.1
>>>>>> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /lib64/libnss_files.so.2
>>>>>> Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols
>>>>>> found)...done.
>>>>>> Loaded symbols for /lib64/libnss_dns.so.2
>>>>>> 0x000000000044790a in ?? ()
>>>>>> Missing separate debuginfos, use: debuginfo-install pgpool-II-3.0.4-1.el6.x86_64
>>>>>> (gdb) bt
>>>>>> #0  0x000000000044790a in ?? ()
>>>>>> #1  0x0000000000414547 in ?? ()
>>>>>> #2  0x000000000041762e in ?? ()
>>>>>> #3  0x000000000040a4cd in ?? ()
>>>>>> #4  0x0000000000405345 in ?? ()
>>>>>> #5  0x00000000004068dc in ?? ()
>>>>>> #6  0x00000000004076dc in ?? ()
>>>>>> #7  0x00000031ae41ec5d in __libc_start_main () from /lib64/libc.so.6
>>>>>> #8  0x0000000000403bf9 in ?? ()
>>>>>> #9  0x00007fff0663cfc8 in ?? ()
>>>>>> #10 0x000000000000001c in ?? ()
>>>>>> #11 0x0000000000000004 in ?? ()
>>>>>> #12 0x00007fff0663d90b in ?? ()
>>>>>> #13 0x00007fff0663dfe6 in ?? ()
>>>>>> #14 0x00007fff0663dfe6 in ?? ()
>>>>>> #15 0x00007fff0663dfe6 in ?? ()
>>>>>> #16 0x0000000000000000 in ?? ()
>>>>>> (gdb) cont
>>>>>> Continuing.
>>>>>>
>>>>>> ###############
>>>>>>
>>>>>> The entire session completely hung at the end there.
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 14, 2011 at 3:57 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>>>> Please use gdb. For example,
>>>>>>>
>>>>>>> become postgres user (or root user)
>>>>>>> gdb pgpool 29191
>>>>>>> bt
>>>>>>> cont
>>>>>>> bt
>>>>>>> cont
>>>>>>> :
>>>>>>> :
>>>>>>> :
>>>>>>>
>>>>>>> This will give us an idea where it's looping.
>>>>>>> --
>>>>>>> Tatsuo Ishii
>>>>>>> SRA OSS, Inc. Japan
>>>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>>>> Japanese: http://www.sraoss.co.jp
>>>>>>>
>>>>>>>> This problem has returned yet again:
>>>>>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>>>>> 29191 postgres  20   0 80192  14m 1544 R 89.8  0.2  51:15.91 pgpool
>>>>>>>>
>>>>>>>> postgres 29191  3.4  0.1  80192 14728 ?        R    Sep13  51:40
>>>>>>>> pgpool: lfriedman nightly 10.31.96.84(61698) idle
>>>>>>>>
>>>>>>>>
>>>>>>>> I'd really appreciate some input on how to debug this.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman <netllama at gmail.com> wrote:
>>>>>>>>> No one else has experienced this or has suggestions how to debug it?
>>>>>>>>>
>>>>>>>>> On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman <netllama at gmail.com> wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load
>>>>>>>>>> balancer for a three server postgresql-9.0.4 cluster (1 master, 2
>>>>>>>>>> standby).  I'm seeing strange behavior where a single pgpool process
>>>>>>>>>> seems to hang after some period of time, and then consume 100% of the
>>>>>>>>>> CPU.  I've seen this behavior happen twice since last Friday (when
>>>>>>>>>> pgpool was brought online in my production environment).  At the
>>>>>>>>>> moment the current hung process looks like this in 'ps auxww' output:
>>>>>>>>>>
>>>>>>>>>> postgres 19838 98.7  0.0  68856  2904 ?        R    Sep06 1027:36
>>>>>>>>>> pgpool: lfriedman nightly 10.31.45.20(58277) idle
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In top, I see:
>>>>>>>>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>>>>>>> 19838 postgres  20   0 68856 2904 1072 R 100.0  0.0   1027:29 pgpool
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> When to connect to the process with strace, there is no output, so I'm
>>>>>>>>>> guessing the process is stuck spinning somewhere:
>>>>>>>>>> # strace -p 19838
>>>>>>>>>> Process 19838 attached - interrupt to quit
>>>>>>>>>> ...
>>>>>>>>>> ^CProcess 19838 detached
>>>>>>>>>>
>>>>>>>>>> One thing that i'm certain of is that the client IP (10.31.45.20)
>>>>>>>>>> associated with the hung process has rebooted at least once since that
>>>>>>>>>> process was spawned.  So pgpool seems to be in some confused state, as
>>>>>>>>>> the client definitely severed the connection already.  I checked the
>>>>>>>>>> pgpool log and there are no explicit references to PID 19838.  I'm at
>>>>>>>>>> a loss how to debug this further, but clearly something is wrong
>>>>>>>>>> somewhere, and this isn't normal/expected behavior.


More information about the Pgpool-general mailing list