[pgpool-general: 693] Re: Transaction never finishes

Tatsuo Ishii ishii at postgresql.org
Tue Jul 3 18:01:16 JST 2012


Luiz,

> On Fri, Jun 29, 2012 at 8:54 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>> On Fri, Jun 29, 2012 at 12:09 PM, Luiz Pasqual <luiz at pasquall.com> wrote:
>>>> On Thu, Jun 28, 2012 at 8:06 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I'm using pgpool-II 3.1.2 with streaming replication and it's working
>>>>>> pretty well. But I'm dealing with a weird situation and I don't know
>>>>>> how to debug:
>>>>>>
>>>>>> Sometimes, some transactions never finishes in the master, here is an
>>>>>> example, the following query:
>>>>>> select * from pg_stat_activity where xact_start < current_timestamp -
>>>>>> '10 minutes'::interval
>>>>>>
>>>>>> Results:
>>>>>> 20994;"****";2445;16385;"****";"";"192.168.**.**";"";44083;"2012-06-27
>>>>>> 05:55:39.525881-03";"2012-06-27 11:17:46.475347-03";"2012-06-27
>>>>>> 11:18:10.044718-03";f;"<IDLE> in transaction"
>>>>>>
>>>>>> This transaction gets AccessShareLock in the relations:
>>>>>> pg_class_relname_nsp_index
>>>>>> pg_class
>>>>>> pg_class_oid_index
>>>>>> pg_namespace
>>>>>> pg_namespace_oid_index
>>>>>> pg_namespace_nspname_index
>>>>>>
>>>>>> And one ExclusiveLock that I couldn't identify the relation.
>>>>>>
>>>>>> Sometimes, depending on the relations that are locked everything fails
>>>>>> (timeout) and a pgpool restart must be done. Anyone knows what is
>>>>>> going on?
>>>>>
>>>>> You want to identify the process id of pgpool which is dealing with
>>>>> PostgreSQL backend process id(in the case above it's 2445).
>>>>>
>>>>> Here is the step to find pgpool process id:
>>>>>
>>>>> 1) Execute pcp_proc_count to find pgpool process list. This command
>>>>>   returns all pgpool process ids.
>>>>>
>>>>> 2) For each process id in #1, execute pcp_proc_info. This will tell
>>>>>   what are the PostgreSQL process ids. Note that the command returns
>>>>>   multiple rows sorting by node id. Usually node id 0 (thus the first
>>>>>   line) is for primary.
>>>>>
>>>>> 3) Look for 2445 in #2 to find the pgpool process id.
>>>>>
>>>>> 4) If you could find the pgpool process id (say 12345), grep pgpool
>>>>>   log by using 12345. This will show what's going on with 12345.
>>>>
>>>> Well, they are all "idle in transaction".
>>>
>>> With a little more debugging, I discovered that the client who uses
>>> this specific process already disconnects from pgpool:
>>>
>>> # ps aux | grep 25003
>>> pgpool: *** *** 127.0.0.1(51456) idle in transaction
>>>
>>> # lsof -i -P -n | grep 25003
>>> pgpool    25003       root    5u  IPv4 39609135      0t0  TCP *:5432 (LISTEN)
>>> pgpool    25003       root    6u  IPv4 39621031      0t0  TCP
>>> 127.0.0.1:5432->127.0.0.1:51456 (CLOSE_WAIT)
>>> pgpool    25003       root    7u  IPv4 39621043      0t0  TCP
>>> 192.168.10.55:33684->192.168.10.101:5432 (ESTABLISHED)
>>> pgpool    25003       root    8u  IPv4 39621044      0t0  TCP
>>> 192.168.10.55:51761->192.168.10.11:5432 (CLOSE_WAIT)
>>>
>>> There is no connection between the client (127.0.0.1:51456) and ( <- )
>>> pgpool (127.0.0.1:5432).
>>>
>>> For some reason, this transaction was aborted, client disconnected
>>> from pgpool, pgpool aborted the transaction on the slave server
>>> (192.168.10.11) but not on the master (192.168.10.55).
>>>
>>> I'm running out of ideas.
>>
>>> 192.168.10.55:33684->192.168.10.101:5432 (ESTABLISHED)
>>
>> What is on 192.168.10.101:5432?
> 
> It's my database, a PostgresSQL 9.1.3, streaming replication to 192.168.10.11
> 
> 
>> Also can you attach gdb to 25003 and take back trace?
> 
> Sure:
> 
> (gdb) bt
> #0  0x00007f801e2c5073 in __select_nocancel () from /lib64/libc.so.6
> #1  0x000000000041661e in pool_check_fd (cp=<value optimized out>) at
> pool_process_query.c:1050
> #2  0x0000000000419e64 in pool_read (cp=0x70f1d0, buf=0x7fff24248d3f,
> len=1) at pool_stream.c:138
> #3  0x00000000004158db in read_kind_from_backend (frontend=<value
> optimized out>, backend=0x70b420, decided_kind=<value optimized out>)
> at pool_process_query.c:3490
> #4  0x0000000000443cf9 in ProcessBackendResponse (frontend=0x70c450,
> backend=0x70b420, state=0x7fff24248fb8, num_fields=0x7fff24248fbc) at
> pool_proto_modules.c:2164
> #5  0x00000000004188b0 in pool_process_query (frontend=0x70c450,
> backend=0x70b420, reset_request=0) at pool_process_query.c:416
> #6  0x000000000040a4d0 in do_child (unix_fd=4, inet_fd=5) at child.c:354
> #7  0x0000000000404245 in fork_a_child (unix_fd=4, inet_fd=5, id=29)
> at main.c:1072
> #8  0x000000000040726f in main (argc=<value optimized out>,
> argv=<value optimized out>) at main.c:549
> (gdb)
> 
> Thanks for your support.

I'm looking for how reproduce your problem. Is there any easy way to
reproduce?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


More information about the pgpool-general mailing list