[pgpool-general: 695] Re: Transaction never finishes

Luiz Pasqual luiz at pasquall.com
Tue Jul 3 20:39:26 JST 2012


On Tue, Jul 3, 2012 at 7:40 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> Luiz,
>
>>> On Fri, Jun 29, 2012 at 8:54 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>> On Fri, Jun 29, 2012 at 12:09 PM, Luiz Pasqual <luiz at pasquall.com> wrote:
>>>>>> On Thu, Jun 28, 2012 at 8:06 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I'm using pgpool-II 3.1.2 with streaming replication and it's working
>>>>>>>> pretty well. But I'm dealing with a weird situation and I don't know
>>>>>>>> how to debug:
>>>>>>>>
>>>>>>>> Sometimes, some transactions never finishes in the master, here is an
>>>>>>>> example, the following query:
>>>>>>>> select * from pg_stat_activity where xact_start < current_timestamp -
>>>>>>>> '10 minutes'::interval
>>>>>>>>
>>>>>>>> Results:
>>>>>>>> 20994;"****";2445;16385;"****";"";"192.168.**.**";"";44083;"2012-06-27
>>>>>>>> 05:55:39.525881-03";"2012-06-27 11:17:46.475347-03";"2012-06-27
>>>>>>>> 11:18:10.044718-03";f;"<IDLE> in transaction"
>>>>>>>>
>>>>>>>> This transaction gets AccessShareLock in the relations:
>>>>>>>> pg_class_relname_nsp_index
>>>>>>>> pg_class
>>>>>>>> pg_class_oid_index
>>>>>>>> pg_namespace
>>>>>>>> pg_namespace_oid_index
>>>>>>>> pg_namespace_nspname_index
>>>>>>>>
>>>>>>>> And one ExclusiveLock that I couldn't identify the relation.
>>>>>>>>
>>>>>>>> Sometimes, depending on the relations that are locked everything fails
>>>>>>>> (timeout) and a pgpool restart must be done. Anyone knows what is
>>>>>>>> going on?
>>>>>>>
>>>>>>> You want to identify the process id of pgpool which is dealing with
>>>>>>> PostgreSQL backend process id(in the case above it's 2445).
>>>>>>>
>>>>>>> Here is the step to find pgpool process id:
>>>>>>>
>>>>>>> 1) Execute pcp_proc_count to find pgpool process list. This command
>>>>>>>   returns all pgpool process ids.
>>>>>>>
>>>>>>> 2) For each process id in #1, execute pcp_proc_info. This will tell
>>>>>>>   what are the PostgreSQL process ids. Note that the command returns
>>>>>>>   multiple rows sorting by node id. Usually node id 0 (thus the first
>>>>>>>   line) is for primary.
>>>>>>>
>>>>>>> 3) Look for 2445 in #2 to find the pgpool process id.
>>>>>>>
>>>>>>> 4) If you could find the pgpool process id (say 12345), grep pgpool
>>>>>>>   log by using 12345. This will show what's going on with 12345.
>>>>>>
>>>>>> Well, they are all "idle in transaction".
>>>>>
>>>>> With a little more debugging, I discovered that the client who uses
>>>>> this specific process already disconnects from pgpool:
>>>>>
>>>>> # ps aux | grep 25003
>>>>> pgpool: *** *** 127.0.0.1(51456) idle in transaction
>>>>>
>>>>> # lsof -i -P -n | grep 25003
>>>>> pgpool    25003       root    5u  IPv4 39609135      0t0  TCP *:5432 (LISTEN)
>>>>> pgpool    25003       root    6u  IPv4 39621031      0t0  TCP
>>>>> 127.0.0.1:5432->127.0.0.1:51456 (CLOSE_WAIT)
>>>>> pgpool    25003       root    7u  IPv4 39621043      0t0  TCP
>>>>> 192.168.10.55:33684->192.168.10.101:5432 (ESTABLISHED)
>>>>> pgpool    25003       root    8u  IPv4 39621044      0t0  TCP
>>>>> 192.168.10.55:51761->192.168.10.11:5432 (CLOSE_WAIT)
>>>>>
>>>>> There is no connection between the client (127.0.0.1:51456) and ( <- )
>>>>> pgpool (127.0.0.1:5432).
>>>>>
>>>>> For some reason, this transaction was aborted, client disconnected
>>>>> from pgpool, pgpool aborted the transaction on the slave server
>>>>> (192.168.10.11) but not on the master (192.168.10.55).
>>>>>
>>>>> I'm running out of ideas.
>>>>
>>>>> 192.168.10.55:33684->192.168.10.101:5432 (ESTABLISHED)
>>>>
>>>> What is on 192.168.10.101:5432?
>>>
>>> It's my database, a PostgresSQL 9.1.3, streaming replication to 192.168.10.11
>>>
>>>
>>>> Also can you attach gdb to 25003 and take back trace?
>>>
>>> Sure:
>>>
>>> (gdb) bt
>>> #0  0x00007f801e2c5073 in __select_nocancel () from /lib64/libc.so.6
>>> #1  0x000000000041661e in pool_check_fd (cp=<value optimized out>) at
>>> pool_process_query.c:1050
>>> #2  0x0000000000419e64 in pool_read (cp=0x70f1d0, buf=0x7fff24248d3f,
>>> len=1) at pool_stream.c:138
>>> #3  0x00000000004158db in read_kind_from_backend (frontend=<value
>>> optimized out>, backend=0x70b420, decided_kind=<value optimized out>)
>>> at pool_process_query.c:3490
>>> #4  0x0000000000443cf9 in ProcessBackendResponse (frontend=0x70c450,
>>> backend=0x70b420, state=0x7fff24248fb8, num_fields=0x7fff24248fbc) at
>>> pool_proto_modules.c:2164
>>> #5  0x00000000004188b0 in pool_process_query (frontend=0x70c450,
>>> backend=0x70b420, reset_request=0) at pool_process_query.c:416
>>> #6  0x000000000040a4d0 in do_child (unix_fd=4, inet_fd=5) at child.c:354
>>> #7  0x0000000000404245 in fork_a_child (unix_fd=4, inet_fd=5, id=29)
>>> at main.c:1072
>>> #8  0x000000000040726f in main (argc=<value optimized out>,
>>> argv=<value optimized out>) at main.c:549
>>> (gdb)
>>>
>>> Thanks for your support.
>>
>> I'm looking for how reproduce your problem. Is there any easy way to
>> reproduce?
>
> I think I found a cause of the problem.
> Can you try with attached patch?

Tatsuo,

I deployed the patch, since I can't reproduce the problem, I'll wait
and see. In general, I see it happens 3 or 4 times a day, so I think
it's going to be easy to know the results.

I really thank you for this great software.

-- 
Luiz Fernando Pasqual S. Souza
mail: luiz at pasquall.com


More information about the pgpool-general mailing list