[pgpool-general: 4484] Re: Pgpool - connection hangs in DISCARD ALL

Muhammad Usama m.usama at gmail.com
Tue Feb 23 21:27:51 JST 2016


On Tue, Feb 23, 2016 at 4:16 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> Usama,
>
> Doesn't pgpool-II 3.1 have the same problem?

Sorry, I missed that and 3_0 aswell. I have pushed the same change to
the both branches.

Regards
Muhammad Usama


>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>> Hi
>>
>> I have pushed the above fix in all branches from pgpool-II 3.2 onwards.
>> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6afacb1b19603b37e3d005963182258b9f4fca49
>>
>> Thanks again for your help in verifying and testing the fix.
>>
>> Kind regards
>> Muhammad Usama
>>
>>
>> On Sat, Feb 20, 2016 at 1:03 AM, Muhammad Usama <m.usama at gmail.com> wrote:
>>
>>> Hi
>>>
>>> Many thanks for the confirmation, the fix needs to go in all branches and
>>> I will push it in the morning.
>>>
>>> Regards
>>> Muhammad Usama
>>>
>>> Sent from my iPhone
>>>
>>> > On 19-Feb-2016, at 11:52 PM, Gerhard Wiesinger <lists at wiesinger.com>
>>> wrote:
>>> >
>>> > Hello,
>>> >
>>> > Can confirm that this patch worked for me (tested the 3.5 patch
>>> version), nearly 2 days without any problem. Can you please add it to the
>>> git repo and make a new release (3.4, 3.5).
>>> >
>>> > Thnx.
>>> >
>>> > Ciao,
>>> > Gerhard
>>> >
>>> >
>>> >> On 16.02.2016 11:44, Muhammad Usama wrote:
>>> >> Hi
>>> >>
>>> >> Many thanks for the reply and a good news that you are not getting
>>> >> stuck connection issue after the patch.
>>> >>
>>> >> Thanks
>>> >> Best regards
>>> >> Muhammad Usama
>>> >>
>>> >>
>>> >>> On Fri, Feb 12, 2016 at 9:45 PM, Paweł Ufnalewski <archon at foap.com>
>>> wrote:
>>> >>> Hmm it looks like it's fine now. Right now I only see these in log:
>>> >>>
>>> >>> 2016-02-12 17:28:12: pid 8838: LOG:  child process with pid: 27299
>>> exits
>>> >>> with status 256
>>> >>> 2016-02-12 17:28:12: pid 8838: LOG:  fork a new child process with
>>> pid: 6140
>>> >>> 2016-02-12 17:30:42: pid 8838: LOG:  child process with pid: 5571
>>> exits with
>>> >>> status 512
>>> >>> 2016-02-12 17:30:42: pid 8838: LOG:  fork a new child process with
>>> pid: 6720
>>> >>> 2016-02-12 17:30:43: pid 8838: LOG:  child process with pid: 4444
>>> exits with
>>> >>> status 512
>>> >>> 2016-02-12 17:30:43: pid 8838: LOG:  fork a new child process with
>>> pid: 6751
>>> >>> 2016-02-12 17:35:42: pid 8838: LOG:  child process with pid: 6140
>>> exits with
>>> >>> status 512
>>> >>> 2016-02-12 17:35:42: pid 8838: LOG:  fork a new child process with
>>> pid: 7868
>>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6751
>>> exits with
>>> >>> status 512
>>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with
>>> pid: 9018
>>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6720
>>> exits with
>>> >>> status 512
>>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with
>>> pid: 9019
>>> >>>
>>> >>> Thank you!
>>> >>>
>>> >>> Best regards,
>>> >>> Paweł Ufnalewski
>>> >>> Infrastructure Architect at Foap.com
>>> >>>
>>> >>> W dniu 2016-02-09 o 14:02, Muhammad Usama pisze:
>>> >>>
>>> >>>> Hi
>>> >>>>
>>> >>>> Many thanks for sharing the pgpool.log, The log shared by you does
>>> >>>> contains some error messages "ERROR: unable to to flush data to
>>> >>>> frontend" that have the potential to cause the stuck connection
>>> >>>> Can you please try out the attached patch if it fix the problem. I am
>>> >>>> attaching the patches for both 3_5 and 3_4 branches, please use the
>>> >>>> respective patch as per your setup. Hopefully this should fix the
>>> >>>> stuck issue.
>>> >>>>
>>> >>>> Kind regards
>>> >>>> Muhammad Usama
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>> On Mon, Feb 8, 2016 at 8:49 PM, Paweł Ufnalewski <archon at foap.com>
>>> wrote:
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>>      It looks like it hangs in this places (see attachment). Problem
>>> is,
>>> >>>>> that
>>> >>>>> developer responsible for app has changed something in code, so
>>> >>>>> connections
>>> >>>>> now closes properly from client side (before I got a lot of these
>>> errors:
>>> >>>>>
>>> >>>>> 2016-02-08 09:33:39: pid 8472: ERROR:  unable to read data from
>>> frontend
>>> >>>>> 2016-02-08 09:33:39: pid 8472: DETAIL:  EOF encountered with
>>> frontend)
>>> >>>>> .
>>> >>>>>
>>> >>>>> Best regards,
>>> >>>>> Paweł Ufnalewski
>>> >>>>> Infrastructure Architect at Foap.com
>>> >>>>>
>>> >>>>> W dniu 2016-02-08 o 09:00, Muhammad Usama pisze:
>>> >>>>>
>>> >>>>> Hi
>>> >>>>>
>>> >>>>> Thanks in advance for the help. If you could share the pgpool-II log
>>> >>>>> when the stuck connection happens that would help us in identifiny
>>> and
>>> >>>>> rectifing the problem.
>>> >>>>>
>>> >>>>> Thanks
>>> >>>>> Best regards
>>> >>>>> Muhammad Usama
>>> >>>>>
>>> >>>>>
>>> >>>>> On Mon, Feb 8, 2016 at 11:36 AM, Paweł Ufnalewski <archon at foap.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>>      just to let you know - I'm having the same problem with 3.4.4
>>> >>>>> version
>>> >>>>> (DISCARD ALL appears slower than in 3.4.3 I think, but it still
>>> does).
>>> >>>>> How
>>> >>>>> can I help to fix this problem?
>>> >>>>>
>>> >>>>> Best regards,
>>> >>>>> Paweł Ufnalewski
>>> >>>>> Infrastructure Architect at Foap.com
>>> >>>>>
>>> >>>>> W dniu 2016-02-01 o 08:44, Muhammad Usama pisze:
>>> >>>>>
>>> >>>>> Hi Gerhard
>>> >>>>>
>>> >>>>> Many thanks for testing and pointing this out. It's unfortunate that
>>> you
>>> >>>>> are
>>> >>>>> still getting the stuck connection issue. If it is possible can you
>>> >>>>> please
>>> >>>>> share the pgpool-II log for the time when this stuck connection issue
>>> >>>>> happens. I am more interested in seeing which exact error message
>>> that
>>> >>>>> caused the child process to jump to error handler from where the
>>> child
>>> >>>>> process proceeded to send the DISCARD ALL to backend and eventually
>>> got
>>> >>>>> stuck. Since after many tries we are not able to reproduce this
>>> issue, so
>>> >>>>> log would be really helpful in understanding and fixing the problem.
>>> >>>>>
>>> >>>>> Best regards
>>> >>>>> Muhammad Usama
>>> >>>>>
>>> >>>>>
>>> >>>>> On Sun, Jan 31, 2016 at 9:33 PM, Gerhard Wiesinger <
>>> lists at wiesinger.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> On 28.01.2016 01:10, Tatsuo Ishii wrote:
>>> >>>>>
>>> >>>>> On 21.01.2016 20:52, Muhammad Usama wrote:
>>> >>>>>
>>> >>>>> Hi
>>> >>>>>
>>> >>>>> I am looking into this issue. and unfortunately like Ishii-San I am
>>> >>>>> also not able to reproduce it. But I found one issue in 3.4 that
>>> might
>>> >>>>> cause the problem. Can you please try the attached patch if it solves
>>> >>>>> the problem. Also, if the problem still persists, it would be really
>>> >>>>> helpful if you could share the pgpool-II log.
>>> >>>>>
>>> >>>>> I looked at the patch but it includes only logging changes and no
>>> >>>>> functional changes. Therefore I didn't test it. Do you expect and
>>> >>>>> behavioral changes to fix it, and why?
>>> >>>>>
>>> >>>>> elog() is not only a logging function, but also it plays very
>>> >>>>> important role including exception handling and error treatments in
>>> >>>>> pgpool-II. If you are familiar with PostgreSQL internals, you may
>>> >>>>> notice it (elog() was imported from PostgreSQL source tree).
>>> >>>>>
>>> >>>>> Tried version 3.5.0 where the patch is included. Still not working.
>>> See
>>> >>>>> backtrace below.
>>> >>>>>
>>> >>>>> Reverting to 3.3.7 which works perfectly.
>>> >>>>>
>>> >>>>> Ciao,
>>> >>>>> Gerhard
>>> >>>>>
>>> >>>>> (gdb) back
>>> >>>>> #0  0x00007fd87fdb6d43 in __select_nocancel () from /lib64/libc.so.6
>>> >>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610)
>>> at
>>> >>>>> protocol/pool_process_query.c:635
>>> >>>>> #2  0x0000564471af1976 in pool_check_fd (cp=cp at entry=0x564473dfa610)
>>> at
>>> >>>>> protocol/pool_process_query.c:657
>>> >>>>> #3  0x0000564471b1f67b in pool_read (cp=0x564473dfa610,
>>> >>>>> buf=buf at entry=0x7ffc1d71bf97, len=len at entry=1) at
>>> utils/pool_stream.c:162
>>> >>>>> #4  0x0000564471af8e6e in read_kind_from_backend
>>> >>>>> (frontend=frontend at entry=0x564473df3e60,
>>> >>>>> backend=backend at entry=0x564473df2e00,
>>> >>>>>      decided_kind=decided_kind at entry=0x7ffc1d71c397 "E") at
>>> >>>>> protocol/pool_process_query.c:3234
>>> >>>>> #5  0x0000564471affdc9 in ProcessBackendResponse
>>> >>>>> (frontend=frontend at entry=0x564473df3e60,
>>> >>>>> backend=backend at entry=0x564473df2e00, state=state at entry
>>> =0x7ffc1d71c41c,
>>> >>>>>      num_fields=num_fields at entry=0x7ffc1d71c41a) at
>>> >>>>> protocol/pool_proto_modules.c:2356
>>> >>>>> #6  0x0000564471af5b15 in pool_process_query
>>> (frontend=0x564473df3e60,
>>> >>>>> backend=0x564473df2e00, reset_request=reset_request at entry=1) at
>>> >>>>> protocol/pool_process_query.c:302
>>> >>>>> #7  0x0000564471aed98c in backend_cleanup (backend=<optimized out>,
>>> >>>>> frontend_invalid=frontend_invalid at entry=0 '\000',
>>> frontend=0x564471e09e40
>>> >>>>> <child_frontend>)
>>> >>>>>      at protocol/child.c:437
>>> >>>>> #8  0x0000564471af0637 in do_child (fds=fds at entry=0x564473dee030) at
>>> >>>>> protocol/child.c:234
>>> >>>>> #9  0x0000564471ace107 in fork_a_child (fds=0x564473dee030, id=8) at
>>> >>>>> main/pgpool_main.c:678
>>> >>>>> #10 0x0000564471aceb6d in reaper () at main/pgpool_main.c:2254
>>> >>>>> #11 0x0000564471ad322b in PgpoolMain (discard_status=<optimized out>,
>>> >>>>> clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:429
>>> >>>>> #12 0x0000564471acc7b1 in main (argc=<optimized out>,
>>> >>>>> argv=0x7ffc1d7219e8)
>>> >>>>> at main/main.c:310
>>> >>>>>
>>> >>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610)
>>> at
>>> >>>>> protocol/pool_process_query.c:635
>>> >>>>> 635                     fds = select(fd+1, &readmask, NULL,
>>> &exceptmask,
>>> >>>>> timeoutp);
>>> >>>>>
>>> >>>>> (gdb) print fd
>>> >>>>> $1 = 8
>>> >>>>> (gdb) print readmask
>>> >>>>> $2 = {fds_bits = {256, 0 <repeats 15 times>}}
>>> >>>>> (gdb) print exceptmask
>>> >>>>> $3 = {fds_bits = {256, 0 <repeats 15 times>}}
>>> >>>>> (gdb) print timeoutp
>>> >>>>> $4 = (struct timeval *) 0x0
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> _______________________________________________
>>> >>>>> pgpool-general mailing list
>>> >>>>> pgpool-general at pgpool.net
>>> >>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> >> _______________________________________________
>>> >> pgpool-general mailing list
>>> >> pgpool-general at pgpool.net
>>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> >
>>>


More information about the pgpool-general mailing list