[pgpool-general: 4486] Re: Pgpool - connection hangs in DISCARD ALL

Tatsuo Ishii ishii at postgresql.org
Tue Feb 23 23:08:31 JST 2016


> On Tue, Feb 23, 2016 at 4:16 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>> Usama,
>>
>> Doesn't pgpool-II 3.1 have the same problem?
> 
> Sorry, I missed that and 3_0 aswell. I have pushed the same change to
> the both branches.

No, you don't need to take care of 3.0 any more. Remember 3.0 was EOL
now (3.0.20 released on this February was the last release) (I don't
think you need to revert the patch for 3.0 however).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Regards
> Muhammad Usama
> 
> 
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>> Hi
>>>
>>> I have pushed the above fix in all branches from pgpool-II 3.2 onwards.
>>> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6afacb1b19603b37e3d005963182258b9f4fca49
>>>
>>> Thanks again for your help in verifying and testing the fix.
>>>
>>> Kind regards
>>> Muhammad Usama
>>>
>>>
>>> On Sat, Feb 20, 2016 at 1:03 AM, Muhammad Usama <m.usama at gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Many thanks for the confirmation, the fix needs to go in all branches and
>>>> I will push it in the morning.
>>>>
>>>> Regards
>>>> Muhammad Usama
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On 19-Feb-2016, at 11:52 PM, Gerhard Wiesinger <lists at wiesinger.com>
>>>> wrote:
>>>> >
>>>> > Hello,
>>>> >
>>>> > Can confirm that this patch worked for me (tested the 3.5 patch
>>>> version), nearly 2 days without any problem. Can you please add it to the
>>>> git repo and make a new release (3.4, 3.5).
>>>> >
>>>> > Thnx.
>>>> >
>>>> > Ciao,
>>>> > Gerhard
>>>> >
>>>> >
>>>> >> On 16.02.2016 11:44, Muhammad Usama wrote:
>>>> >> Hi
>>>> >>
>>>> >> Many thanks for the reply and a good news that you are not getting
>>>> >> stuck connection issue after the patch.
>>>> >>
>>>> >> Thanks
>>>> >> Best regards
>>>> >> Muhammad Usama
>>>> >>
>>>> >>
>>>> >>> On Fri, Feb 12, 2016 at 9:45 PM, Paweł Ufnalewski <archon at foap.com>
>>>> wrote:
>>>> >>> Hmm it looks like it's fine now. Right now I only see these in log:
>>>> >>>
>>>> >>> 2016-02-12 17:28:12: pid 8838: LOG:  child process with pid: 27299
>>>> exits
>>>> >>> with status 256
>>>> >>> 2016-02-12 17:28:12: pid 8838: LOG:  fork a new child process with
>>>> pid: 6140
>>>> >>> 2016-02-12 17:30:42: pid 8838: LOG:  child process with pid: 5571
>>>> exits with
>>>> >>> status 512
>>>> >>> 2016-02-12 17:30:42: pid 8838: LOG:  fork a new child process with
>>>> pid: 6720
>>>> >>> 2016-02-12 17:30:43: pid 8838: LOG:  child process with pid: 4444
>>>> exits with
>>>> >>> status 512
>>>> >>> 2016-02-12 17:30:43: pid 8838: LOG:  fork a new child process with
>>>> pid: 6751
>>>> >>> 2016-02-12 17:35:42: pid 8838: LOG:  child process with pid: 6140
>>>> exits with
>>>> >>> status 512
>>>> >>> 2016-02-12 17:35:42: pid 8838: LOG:  fork a new child process with
>>>> pid: 7868
>>>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6751
>>>> exits with
>>>> >>> status 512
>>>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with
>>>> pid: 9018
>>>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6720
>>>> exits with
>>>> >>> status 512
>>>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with
>>>> pid: 9019
>>>> >>>
>>>> >>> Thank you!
>>>> >>>
>>>> >>> Best regards,
>>>> >>> Paweł Ufnalewski
>>>> >>> Infrastructure Architect at Foap.com
>>>> >>>
>>>> >>> W dniu 2016-02-09 o 14:02, Muhammad Usama pisze:
>>>> >>>
>>>> >>>> Hi
>>>> >>>>
>>>> >>>> Many thanks for sharing the pgpool.log, The log shared by you does
>>>> >>>> contains some error messages "ERROR: unable to to flush data to
>>>> >>>> frontend" that have the potential to cause the stuck connection
>>>> >>>> Can you please try out the attached patch if it fix the problem. I am
>>>> >>>> attaching the patches for both 3_5 and 3_4 branches, please use the
>>>> >>>> respective patch as per your setup. Hopefully this should fix the
>>>> >>>> stuck issue.
>>>> >>>>
>>>> >>>> Kind regards
>>>> >>>> Muhammad Usama
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>> On Mon, Feb 8, 2016 at 8:49 PM, Paweł Ufnalewski <archon at foap.com>
>>>> wrote:
>>>> >>>>> Hi,
>>>> >>>>>
>>>> >>>>>      It looks like it hangs in this places (see attachment). Problem
>>>> is,
>>>> >>>>> that
>>>> >>>>> developer responsible for app has changed something in code, so
>>>> >>>>> connections
>>>> >>>>> now closes properly from client side (before I got a lot of these
>>>> errors:
>>>> >>>>>
>>>> >>>>> 2016-02-08 09:33:39: pid 8472: ERROR:  unable to read data from
>>>> frontend
>>>> >>>>> 2016-02-08 09:33:39: pid 8472: DETAIL:  EOF encountered with
>>>> frontend)
>>>> >>>>> .
>>>> >>>>>
>>>> >>>>> Best regards,
>>>> >>>>> Paweł Ufnalewski
>>>> >>>>> Infrastructure Architect at Foap.com
>>>> >>>>>
>>>> >>>>> W dniu 2016-02-08 o 09:00, Muhammad Usama pisze:
>>>> >>>>>
>>>> >>>>> Hi
>>>> >>>>>
>>>> >>>>> Thanks in advance for the help. If you could share the pgpool-II log
>>>> >>>>> when the stuck connection happens that would help us in identifiny
>>>> and
>>>> >>>>> rectifing the problem.
>>>> >>>>>
>>>> >>>>> Thanks
>>>> >>>>> Best regards
>>>> >>>>> Muhammad Usama
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Mon, Feb 8, 2016 at 11:36 AM, Paweł Ufnalewski <archon at foap.com>
>>>> >>>>> wrote:
>>>> >>>>>
>>>> >>>>> Hi,
>>>> >>>>>
>>>> >>>>>      just to let you know - I'm having the same problem with 3.4.4
>>>> >>>>> version
>>>> >>>>> (DISCARD ALL appears slower than in 3.4.3 I think, but it still
>>>> does).
>>>> >>>>> How
>>>> >>>>> can I help to fix this problem?
>>>> >>>>>
>>>> >>>>> Best regards,
>>>> >>>>> Paweł Ufnalewski
>>>> >>>>> Infrastructure Architect at Foap.com
>>>> >>>>>
>>>> >>>>> W dniu 2016-02-01 o 08:44, Muhammad Usama pisze:
>>>> >>>>>
>>>> >>>>> Hi Gerhard
>>>> >>>>>
>>>> >>>>> Many thanks for testing and pointing this out. It's unfortunate that
>>>> you
>>>> >>>>> are
>>>> >>>>> still getting the stuck connection issue. If it is possible can you
>>>> >>>>> please
>>>> >>>>> share the pgpool-II log for the time when this stuck connection issue
>>>> >>>>> happens. I am more interested in seeing which exact error message
>>>> that
>>>> >>>>> caused the child process to jump to error handler from where the
>>>> child
>>>> >>>>> process proceeded to send the DISCARD ALL to backend and eventually
>>>> got
>>>> >>>>> stuck. Since after many tries we are not able to reproduce this
>>>> issue, so
>>>> >>>>> log would be really helpful in understanding and fixing the problem.
>>>> >>>>>
>>>> >>>>> Best regards
>>>> >>>>> Muhammad Usama
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Sun, Jan 31, 2016 at 9:33 PM, Gerhard Wiesinger <
>>>> lists at wiesinger.com>
>>>> >>>>> wrote:
>>>> >>>>>
>>>> >>>>> On 28.01.2016 01:10, Tatsuo Ishii wrote:
>>>> >>>>>
>>>> >>>>> On 21.01.2016 20:52, Muhammad Usama wrote:
>>>> >>>>>
>>>> >>>>> Hi
>>>> >>>>>
>>>> >>>>> I am looking into this issue. and unfortunately like Ishii-San I am
>>>> >>>>> also not able to reproduce it. But I found one issue in 3.4 that
>>>> might
>>>> >>>>> cause the problem. Can you please try the attached patch if it solves
>>>> >>>>> the problem. Also, if the problem still persists, it would be really
>>>> >>>>> helpful if you could share the pgpool-II log.
>>>> >>>>>
>>>> >>>>> I looked at the patch but it includes only logging changes and no
>>>> >>>>> functional changes. Therefore I didn't test it. Do you expect and
>>>> >>>>> behavioral changes to fix it, and why?
>>>> >>>>>
>>>> >>>>> elog() is not only a logging function, but also it plays very
>>>> >>>>> important role including exception handling and error treatments in
>>>> >>>>> pgpool-II. If you are familiar with PostgreSQL internals, you may
>>>> >>>>> notice it (elog() was imported from PostgreSQL source tree).
>>>> >>>>>
>>>> >>>>> Tried version 3.5.0 where the patch is included. Still not working.
>>>> See
>>>> >>>>> backtrace below.
>>>> >>>>>
>>>> >>>>> Reverting to 3.3.7 which works perfectly.
>>>> >>>>>
>>>> >>>>> Ciao,
>>>> >>>>> Gerhard
>>>> >>>>>
>>>> >>>>> (gdb) back
>>>> >>>>> #0  0x00007fd87fdb6d43 in __select_nocancel () from /lib64/libc.so.6
>>>> >>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610)
>>>> at
>>>> >>>>> protocol/pool_process_query.c:635
>>>> >>>>> #2  0x0000564471af1976 in pool_check_fd (cp=cp at entry=0x564473dfa610)
>>>> at
>>>> >>>>> protocol/pool_process_query.c:657
>>>> >>>>> #3  0x0000564471b1f67b in pool_read (cp=0x564473dfa610,
>>>> >>>>> buf=buf at entry=0x7ffc1d71bf97, len=len at entry=1) at
>>>> utils/pool_stream.c:162
>>>> >>>>> #4  0x0000564471af8e6e in read_kind_from_backend
>>>> >>>>> (frontend=frontend at entry=0x564473df3e60,
>>>> >>>>> backend=backend at entry=0x564473df2e00,
>>>> >>>>>      decided_kind=decided_kind at entry=0x7ffc1d71c397 "E") at
>>>> >>>>> protocol/pool_process_query.c:3234
>>>> >>>>> #5  0x0000564471affdc9 in ProcessBackendResponse
>>>> >>>>> (frontend=frontend at entry=0x564473df3e60,
>>>> >>>>> backend=backend at entry=0x564473df2e00, state=state at entry
>>>> =0x7ffc1d71c41c,
>>>> >>>>>      num_fields=num_fields at entry=0x7ffc1d71c41a) at
>>>> >>>>> protocol/pool_proto_modules.c:2356
>>>> >>>>> #6  0x0000564471af5b15 in pool_process_query
>>>> (frontend=0x564473df3e60,
>>>> >>>>> backend=0x564473df2e00, reset_request=reset_request at entry=1) at
>>>> >>>>> protocol/pool_process_query.c:302
>>>> >>>>> #7  0x0000564471aed98c in backend_cleanup (backend=<optimized out>,
>>>> >>>>> frontend_invalid=frontend_invalid at entry=0 '\000',
>>>> frontend=0x564471e09e40
>>>> >>>>> <child_frontend>)
>>>> >>>>>      at protocol/child.c:437
>>>> >>>>> #8  0x0000564471af0637 in do_child (fds=fds at entry=0x564473dee030) at
>>>> >>>>> protocol/child.c:234
>>>> >>>>> #9  0x0000564471ace107 in fork_a_child (fds=0x564473dee030, id=8) at
>>>> >>>>> main/pgpool_main.c:678
>>>> >>>>> #10 0x0000564471aceb6d in reaper () at main/pgpool_main.c:2254
>>>> >>>>> #11 0x0000564471ad322b in PgpoolMain (discard_status=<optimized out>,
>>>> >>>>> clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:429
>>>> >>>>> #12 0x0000564471acc7b1 in main (argc=<optimized out>,
>>>> >>>>> argv=0x7ffc1d7219e8)
>>>> >>>>> at main/main.c:310
>>>> >>>>>
>>>> >>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610)
>>>> at
>>>> >>>>> protocol/pool_process_query.c:635
>>>> >>>>> 635                     fds = select(fd+1, &readmask, NULL,
>>>> &exceptmask,
>>>> >>>>> timeoutp);
>>>> >>>>>
>>>> >>>>> (gdb) print fd
>>>> >>>>> $1 = 8
>>>> >>>>> (gdb) print readmask
>>>> >>>>> $2 = {fds_bits = {256, 0 <repeats 15 times>}}
>>>> >>>>> (gdb) print exceptmask
>>>> >>>>> $3 = {fds_bits = {256, 0 <repeats 15 times>}}
>>>> >>>>> (gdb) print timeoutp
>>>> >>>>> $4 = (struct timeval *) 0x0
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> _______________________________________________
>>>> >>>>> pgpool-general mailing list
>>>> >>>>> pgpool-general at pgpool.net
>>>> >>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>> >> _______________________________________________
>>>> >> pgpool-general mailing list
>>>> >> pgpool-general at pgpool.net
>>>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>> >
>>>>


More information about the pgpool-general mailing list