[pgpool-general: 4485] Re: Pgpool - connection hangs in DISCARD ALL

Jose Baez pepote at gmail.com
Tue Feb 23 21:50:45 JST 2016


Could this error happen if  "connection_cache = off" ?



On 23 February 2016 at 12:27, Muhammad Usama <m.usama at gmail.com> wrote:

> On Tue, Feb 23, 2016 at 4:16 AM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> > Usama,
> >
> > Doesn't pgpool-II 3.1 have the same problem?
>
> Sorry, I missed that and 3_0 aswell. I have pushed the same change to
> the both branches.
>
> Regards
> Muhammad Usama
>
>
> >
> > Best regards,
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> > English: http://www.sraoss.co.jp/index_en.php
> > Japanese:http://www.sraoss.co.jp
> >
> >> Hi
> >>
> >> I have pushed the above fix in all branches from pgpool-II 3.2 onwards.
> >>
> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6afacb1b19603b37e3d005963182258b9f4fca49
> >>
> >> Thanks again for your help in verifying and testing the fix.
> >>
> >> Kind regards
> >> Muhammad Usama
> >>
> >>
> >> On Sat, Feb 20, 2016 at 1:03 AM, Muhammad Usama <m.usama at gmail.com>
> wrote:
> >>
> >>> Hi
> >>>
> >>> Many thanks for the confirmation, the fix needs to go in all branches
> and
> >>> I will push it in the morning.
> >>>
> >>> Regards
> >>> Muhammad Usama
> >>>
> >>> Sent from my iPhone
> >>>
> >>> > On 19-Feb-2016, at 11:52 PM, Gerhard Wiesinger <lists at wiesinger.com>
> >>> wrote:
> >>> >
> >>> > Hello,
> >>> >
> >>> > Can confirm that this patch worked for me (tested the 3.5 patch
> >>> version), nearly 2 days without any problem. Can you please add it to
> the
> >>> git repo and make a new release (3.4, 3.5).
> >>> >
> >>> > Thnx.
> >>> >
> >>> > Ciao,
> >>> > Gerhard
> >>> >
> >>> >
> >>> >> On 16.02.2016 11:44, Muhammad Usama wrote:
> >>> >> Hi
> >>> >>
> >>> >> Many thanks for the reply and a good news that you are not getting
> >>> >> stuck connection issue after the patch.
> >>> >>
> >>> >> Thanks
> >>> >> Best regards
> >>> >> Muhammad Usama
> >>> >>
> >>> >>
> >>> >>> On Fri, Feb 12, 2016 at 9:45 PM, Paweł Ufnalewski <archon at foap.com
> >
> >>> wrote:
> >>> >>> Hmm it looks like it's fine now. Right now I only see these in log:
> >>> >>>
> >>> >>> 2016-02-12 17:28:12: pid 8838: LOG:  child process with pid: 27299
> >>> exits
> >>> >>> with status 256
> >>> >>> 2016-02-12 17:28:12: pid 8838: LOG:  fork a new child process with
> >>> pid: 6140
> >>> >>> 2016-02-12 17:30:42: pid 8838: LOG:  child process with pid: 5571
> >>> exits with
> >>> >>> status 512
> >>> >>> 2016-02-12 17:30:42: pid 8838: LOG:  fork a new child process with
> >>> pid: 6720
> >>> >>> 2016-02-12 17:30:43: pid 8838: LOG:  child process with pid: 4444
> >>> exits with
> >>> >>> status 512
> >>> >>> 2016-02-12 17:30:43: pid 8838: LOG:  fork a new child process with
> >>> pid: 6751
> >>> >>> 2016-02-12 17:35:42: pid 8838: LOG:  child process with pid: 6140
> >>> exits with
> >>> >>> status 512
> >>> >>> 2016-02-12 17:35:42: pid 8838: LOG:  fork a new child process with
> >>> pid: 7868
> >>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6751
> >>> exits with
> >>> >>> status 512
> >>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with
> >>> pid: 9018
> >>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6720
> >>> exits with
> >>> >>> status 512
> >>> >>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with
> >>> pid: 9019
> >>> >>>
> >>> >>> Thank you!
> >>> >>>
> >>> >>> Best regards,
> >>> >>> Paweł Ufnalewski
> >>> >>> Infrastructure Architect at Foap.com
> >>> >>>
> >>> >>> W dniu 2016-02-09 o 14:02, Muhammad Usama pisze:
> >>> >>>
> >>> >>>> Hi
> >>> >>>>
> >>> >>>> Many thanks for sharing the pgpool.log, The log shared by you does
> >>> >>>> contains some error messages "ERROR: unable to to flush data to
> >>> >>>> frontend" that have the potential to cause the stuck connection
> >>> >>>> Can you please try out the attached patch if it fix the problem.
> I am
> >>> >>>> attaching the patches for both 3_5 and 3_4 branches, please use
> the
> >>> >>>> respective patch as per your setup. Hopefully this should fix the
> >>> >>>> stuck issue.
> >>> >>>>
> >>> >>>> Kind regards
> >>> >>>> Muhammad Usama
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>>> On Mon, Feb 8, 2016 at 8:49 PM, Paweł Ufnalewski <
> archon at foap.com>
> >>> wrote:
> >>> >>>>> Hi,
> >>> >>>>>
> >>> >>>>>      It looks like it hangs in this places (see attachment).
> Problem
> >>> is,
> >>> >>>>> that
> >>> >>>>> developer responsible for app has changed something in code, so
> >>> >>>>> connections
> >>> >>>>> now closes properly from client side (before I got a lot of these
> >>> errors:
> >>> >>>>>
> >>> >>>>> 2016-02-08 09:33:39: pid 8472: ERROR:  unable to read data from
> >>> frontend
> >>> >>>>> 2016-02-08 09:33:39: pid 8472: DETAIL:  EOF encountered with
> >>> frontend)
> >>> >>>>> .
> >>> >>>>>
> >>> >>>>> Best regards,
> >>> >>>>> Paweł Ufnalewski
> >>> >>>>> Infrastructure Architect at Foap.com
> >>> >>>>>
> >>> >>>>> W dniu 2016-02-08 o 09:00, Muhammad Usama pisze:
> >>> >>>>>
> >>> >>>>> Hi
> >>> >>>>>
> >>> >>>>> Thanks in advance for the help. If you could share the pgpool-II
> log
> >>> >>>>> when the stuck connection happens that would help us in
> identifiny
> >>> and
> >>> >>>>> rectifing the problem.
> >>> >>>>>
> >>> >>>>> Thanks
> >>> >>>>> Best regards
> >>> >>>>> Muhammad Usama
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> On Mon, Feb 8, 2016 at 11:36 AM, Paweł Ufnalewski <
> archon at foap.com>
> >>> >>>>> wrote:
> >>> >>>>>
> >>> >>>>> Hi,
> >>> >>>>>
> >>> >>>>>      just to let you know - I'm having the same problem with
> 3.4.4
> >>> >>>>> version
> >>> >>>>> (DISCARD ALL appears slower than in 3.4.3 I think, but it still
> >>> does).
> >>> >>>>> How
> >>> >>>>> can I help to fix this problem?
> >>> >>>>>
> >>> >>>>> Best regards,
> >>> >>>>> Paweł Ufnalewski
> >>> >>>>> Infrastructure Architect at Foap.com
> >>> >>>>>
> >>> >>>>> W dniu 2016-02-01 o 08:44, Muhammad Usama pisze:
> >>> >>>>>
> >>> >>>>> Hi Gerhard
> >>> >>>>>
> >>> >>>>> Many thanks for testing and pointing this out. It's unfortunate
> that
> >>> you
> >>> >>>>> are
> >>> >>>>> still getting the stuck connection issue. If it is possible can
> you
> >>> >>>>> please
> >>> >>>>> share the pgpool-II log for the time when this stuck connection
> issue
> >>> >>>>> happens. I am more interested in seeing which exact error message
> >>> that
> >>> >>>>> caused the child process to jump to error handler from where the
> >>> child
> >>> >>>>> process proceeded to send the DISCARD ALL to backend and
> eventually
> >>> got
> >>> >>>>> stuck. Since after many tries we are not able to reproduce this
> >>> issue, so
> >>> >>>>> log would be really helpful in understanding and fixing the
> problem.
> >>> >>>>>
> >>> >>>>> Best regards
> >>> >>>>> Muhammad Usama
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> On Sun, Jan 31, 2016 at 9:33 PM, Gerhard Wiesinger <
> >>> lists at wiesinger.com>
> >>> >>>>> wrote:
> >>> >>>>>
> >>> >>>>> On 28.01.2016 01:10, Tatsuo Ishii wrote:
> >>> >>>>>
> >>> >>>>> On 21.01.2016 20:52, Muhammad Usama wrote:
> >>> >>>>>
> >>> >>>>> Hi
> >>> >>>>>
> >>> >>>>> I am looking into this issue. and unfortunately like Ishii-San I
> am
> >>> >>>>> also not able to reproduce it. But I found one issue in 3.4 that
> >>> might
> >>> >>>>> cause the problem. Can you please try the attached patch if it
> solves
> >>> >>>>> the problem. Also, if the problem still persists, it would be
> really
> >>> >>>>> helpful if you could share the pgpool-II log.
> >>> >>>>>
> >>> >>>>> I looked at the patch but it includes only logging changes and no
> >>> >>>>> functional changes. Therefore I didn't test it. Do you expect and
> >>> >>>>> behavioral changes to fix it, and why?
> >>> >>>>>
> >>> >>>>> elog() is not only a logging function, but also it plays very
> >>> >>>>> important role including exception handling and error treatments
> in
> >>> >>>>> pgpool-II. If you are familiar with PostgreSQL internals, you may
> >>> >>>>> notice it (elog() was imported from PostgreSQL source tree).
> >>> >>>>>
> >>> >>>>> Tried version 3.5.0 where the patch is included. Still not
> working.
> >>> See
> >>> >>>>> backtrace below.
> >>> >>>>>
> >>> >>>>> Reverting to 3.3.7 which works perfectly.
> >>> >>>>>
> >>> >>>>> Ciao,
> >>> >>>>> Gerhard
> >>> >>>>>
> >>> >>>>> (gdb) back
> >>> >>>>> #0  0x00007fd87fdb6d43 in __select_nocancel () from
> /lib64/libc.so.6
> >>> >>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry
> =0x564473dfa610)
> >>> at
> >>> >>>>> protocol/pool_process_query.c:635
> >>> >>>>> #2  0x0000564471af1976 in pool_check_fd (cp=cp at entry
> =0x564473dfa610)
> >>> at
> >>> >>>>> protocol/pool_process_query.c:657
> >>> >>>>> #3  0x0000564471b1f67b in pool_read (cp=0x564473dfa610,
> >>> >>>>> buf=buf at entry=0x7ffc1d71bf97, len=len at entry=1) at
> >>> utils/pool_stream.c:162
> >>> >>>>> #4  0x0000564471af8e6e in read_kind_from_backend
> >>> >>>>> (frontend=frontend at entry=0x564473df3e60,
> >>> >>>>> backend=backend at entry=0x564473df2e00,
> >>> >>>>>      decided_kind=decided_kind at entry=0x7ffc1d71c397 "E") at
> >>> >>>>> protocol/pool_process_query.c:3234
> >>> >>>>> #5  0x0000564471affdc9 in ProcessBackendResponse
> >>> >>>>> (frontend=frontend at entry=0x564473df3e60,
> >>> >>>>> backend=backend at entry=0x564473df2e00, state=state at entry
> >>> =0x7ffc1d71c41c,
> >>> >>>>>      num_fields=num_fields at entry=0x7ffc1d71c41a) at
> >>> >>>>> protocol/pool_proto_modules.c:2356
> >>> >>>>> #6  0x0000564471af5b15 in pool_process_query
> >>> (frontend=0x564473df3e60,
> >>> >>>>> backend=0x564473df2e00, reset_request=reset_request at entry=1) at
> >>> >>>>> protocol/pool_process_query.c:302
> >>> >>>>> #7  0x0000564471aed98c in backend_cleanup (backend=<optimized
> out>,
> >>> >>>>> frontend_invalid=frontend_invalid at entry=0 '\000',
> >>> frontend=0x564471e09e40
> >>> >>>>> <child_frontend>)
> >>> >>>>>      at protocol/child.c:437
> >>> >>>>> #8  0x0000564471af0637 in do_child (fds=fds at entry=0x564473dee030)
> at
> >>> >>>>> protocol/child.c:234
> >>> >>>>> #9  0x0000564471ace107 in fork_a_child (fds=0x564473dee030,
> id=8) at
> >>> >>>>> main/pgpool_main.c:678
> >>> >>>>> #10 0x0000564471aceb6d in reaper () at main/pgpool_main.c:2254
> >>> >>>>> #11 0x0000564471ad322b in PgpoolMain (discard_status=<optimized
> out>,
> >>> >>>>> clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:429
> >>> >>>>> #12 0x0000564471acc7b1 in main (argc=<optimized out>,
> >>> >>>>> argv=0x7ffc1d7219e8)
> >>> >>>>> at main/main.c:310
> >>> >>>>>
> >>> >>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry
> =0x564473dfa610)
> >>> at
> >>> >>>>> protocol/pool_process_query.c:635
> >>> >>>>> 635                     fds = select(fd+1, &readmask, NULL,
> >>> &exceptmask,
> >>> >>>>> timeoutp);
> >>> >>>>>
> >>> >>>>> (gdb) print fd
> >>> >>>>> $1 = 8
> >>> >>>>> (gdb) print readmask
> >>> >>>>> $2 = {fds_bits = {256, 0 <repeats 15 times>}}
> >>> >>>>> (gdb) print exceptmask
> >>> >>>>> $3 = {fds_bits = {256, 0 <repeats 15 times>}}
> >>> >>>>> (gdb) print timeoutp
> >>> >>>>> $4 = (struct timeval *) 0x0
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> _______________________________________________
> >>> >>>>> pgpool-general mailing list
> >>> >>>>> pgpool-general at pgpool.net
> >>> >>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >>> >> _______________________________________________
> >>> >> pgpool-general mailing list
> >>> >> pgpool-general at pgpool.net
> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >>> >
> >>>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20160223/6f0c33b4/attachment-0001.html>


More information about the pgpool-general mailing list