[Pgpool-general] seemingly hung pgpool process consuming 100% CPU

Tatsuo Ishii ishii at sraoss.co.jp
Wed Sep 14 22:57:44 UTC 2011


Please use gdb. For example,

become postgres user (or root user)
gdb pgpool 29191
bt
cont
bt
cont
:
:
:

This will give us an idea where it's looping.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> This problem has returned yet again:
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 29191 postgres  20   0 80192  14m 1544 R 89.8  0.2  51:15.91 pgpool
> 
> postgres 29191  3.4  0.1  80192 14728 ?        R    Sep13  51:40
> pgpool: lfriedman nightly 10.31.96.84(61698) idle
> 
> 
> I'd really appreciate some input on how to debug this.
> 
> 
> On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman <netllama at gmail.com> wrote:
>> No one else has experienced this or has suggestions how to debug it?
>>
>> On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman <netllama at gmail.com> wrote:
>>> Greetings,
>>> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load
>>> balancer for a three server postgresql-9.0.4 cluster (1 master, 2
>>> standby).  I'm seeing strange behavior where a single pgpool process
>>> seems to hang after some period of time, and then consume 100% of the
>>> CPU.  I've seen this behavior happen twice since last Friday (when
>>> pgpool was brought online in my production environment).  At the
>>> moment the current hung process looks like this in 'ps auxww' output:
>>>
>>> postgres 19838 98.7  0.0  68856  2904 ?        R    Sep06 1027:36
>>> pgpool: lfriedman nightly 10.31.45.20(58277) idle
>>>
>>>
>>> In top, I see:
>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 19838 postgres  20   0 68856 2904 1072 R 100.0  0.0   1027:29 pgpool
>>>
>>>
>>> When to connect to the process with strace, there is no output, so I'm
>>> guessing the process is stuck spinning somewhere:
>>> # strace -p 19838
>>> Process 19838 attached - interrupt to quit
>>> ...
>>> ^CProcess 19838 detached
>>>
>>> One thing that i'm certain of is that the client IP (10.31.45.20)
>>> associated with the hung process has rebooted at least once since that
>>> process was spawned.  So pgpool seems to be in some confused state, as
>>> the client definitely severed the connection already.  I checked the
>>> pgpool log and there are no explicit references to PID 19838.  I'm at
>>> a loss how to debug this further, but clearly something is wrong
>>> somewhere, and this isn't normal/expected behavior.
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general


More information about the Pgpool-general mailing list