[Pgpool-general] seemingly hung pgpool process consuming 100% CPU
Tatsuo Ishii
ishii at sraoss.co.jp
Wed Sep 14 22:57:44 UTC 2011
Please use gdb. For example,
become postgres user (or root user)
gdb pgpool 29191
bt
cont
bt
cont
:
:
:
This will give us an idea where it's looping.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
> This problem has returned yet again:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 29191 postgres 20 0 80192 14m 1544 R 89.8 0.2 51:15.91 pgpool
>
> postgres 29191 3.4 0.1 80192 14728 ? R Sep13 51:40
> pgpool: lfriedman nightly 10.31.96.84(61698) idle
>
>
> I'd really appreciate some input on how to debug this.
>
>
> On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman <netllama at gmail.com> wrote:
>> No one else has experienced this or has suggestions how to debug it?
>>
>> On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman <netllama at gmail.com> wrote:
>>> Greetings,
>>> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load
>>> balancer for a three server postgresql-9.0.4 cluster (1 master, 2
>>> standby). I'm seeing strange behavior where a single pgpool process
>>> seems to hang after some period of time, and then consume 100% of the
>>> CPU. I've seen this behavior happen twice since last Friday (when
>>> pgpool was brought online in my production environment). At the
>>> moment the current hung process looks like this in 'ps auxww' output:
>>>
>>> postgres 19838 98.7 0.0 68856 2904 ? R Sep06 1027:36
>>> pgpool: lfriedman nightly 10.31.45.20(58277) idle
>>>
>>>
>>> In top, I see:
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 19838 postgres 20 0 68856 2904 1072 R 100.0 0.0 1027:29 pgpool
>>>
>>>
>>> When to connect to the process with strace, there is no output, so I'm
>>> guessing the process is stuck spinning somewhere:
>>> # strace -p 19838
>>> Process 19838 attached - interrupt to quit
>>> ...
>>> ^CProcess 19838 detached
>>>
>>> One thing that i'm certain of is that the client IP (10.31.45.20)
>>> associated with the hung process has rebooted at least once since that
>>> process was spawned. So pgpool seems to be in some confused state, as
>>> the client definitely severed the connection already. I checked the
>>> pgpool log and there are no explicit references to PID 19838. I'm at
>>> a loss how to debug this further, but clearly something is wrong
>>> somewhere, and this isn't normal/expected behavior.
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
More information about the Pgpool-general
mailing list