[pgpool-general: 4476] Re: Pgpool - connection hangs in DISCARD ALL

Muhammad Usama m.usama at gmail.com
Sat Feb 20 05:03:30 JST 2016


Hi

Many thanks for the confirmation, the fix needs to go in all branches and I will push it in the morning.

Regards
Muhammad Usama 

Sent from my iPhone

> On 19-Feb-2016, at 11:52 PM, Gerhard Wiesinger <lists at wiesinger.com> wrote:
> 
> Hello,
> 
> Can confirm that this patch worked for me (tested the 3.5 patch version), nearly 2 days without any problem. Can you please add it to the git repo and make a new release (3.4, 3.5).
> 
> Thnx.
> 
> Ciao,
> Gerhard
> 
> 
>> On 16.02.2016 11:44, Muhammad Usama wrote:
>> Hi
>> 
>> Many thanks for the reply and a good news that you are not getting
>> stuck connection issue after the patch.
>> 
>> Thanks
>> Best regards
>> Muhammad Usama
>> 
>> 
>>> On Fri, Feb 12, 2016 at 9:45 PM, Paweł Ufnalewski <archon at foap.com> wrote:
>>> Hmm it looks like it's fine now. Right now I only see these in log:
>>> 
>>> 2016-02-12 17:28:12: pid 8838: LOG:  child process with pid: 27299 exits
>>> with status 256
>>> 2016-02-12 17:28:12: pid 8838: LOG:  fork a new child process with pid: 6140
>>> 2016-02-12 17:30:42: pid 8838: LOG:  child process with pid: 5571 exits with
>>> status 512
>>> 2016-02-12 17:30:42: pid 8838: LOG:  fork a new child process with pid: 6720
>>> 2016-02-12 17:30:43: pid 8838: LOG:  child process with pid: 4444 exits with
>>> status 512
>>> 2016-02-12 17:30:43: pid 8838: LOG:  fork a new child process with pid: 6751
>>> 2016-02-12 17:35:42: pid 8838: LOG:  child process with pid: 6140 exits with
>>> status 512
>>> 2016-02-12 17:35:42: pid 8838: LOG:  fork a new child process with pid: 7868
>>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6751 exits with
>>> status 512
>>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with pid: 9018
>>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6720 exits with
>>> status 512
>>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with pid: 9019
>>> 
>>> Thank you!
>>> 
>>> Best regards,
>>> Paweł Ufnalewski
>>> Infrastructure Architect at Foap.com
>>> 
>>> W dniu 2016-02-09 o 14:02, Muhammad Usama pisze:
>>> 
>>>> Hi
>>>> 
>>>> Many thanks for sharing the pgpool.log, The log shared by you does
>>>> contains some error messages "ERROR: unable to to flush data to
>>>> frontend" that have the potential to cause the stuck connection
>>>> Can you please try out the attached patch if it fix the problem. I am
>>>> attaching the patches for both 3_5 and 3_4 branches, please use the
>>>> respective patch as per your setup. Hopefully this should fix the
>>>> stuck issue.
>>>> 
>>>> Kind regards
>>>> Muhammad Usama
>>>> 
>>>> 
>>>> 
>>>>> On Mon, Feb 8, 2016 at 8:49 PM, Paweł Ufnalewski <archon at foap.com> wrote:
>>>>> Hi,
>>>>> 
>>>>>      It looks like it hangs in this places (see attachment). Problem is,
>>>>> that
>>>>> developer responsible for app has changed something in code, so
>>>>> connections
>>>>> now closes properly from client side (before I got a lot of these errors:
>>>>> 
>>>>> 2016-02-08 09:33:39: pid 8472: ERROR:  unable to read data from frontend
>>>>> 2016-02-08 09:33:39: pid 8472: DETAIL:  EOF encountered with frontend)
>>>>> .
>>>>> 
>>>>> Best regards,
>>>>> Paweł Ufnalewski
>>>>> Infrastructure Architect at Foap.com
>>>>> 
>>>>> W dniu 2016-02-08 o 09:00, Muhammad Usama pisze:
>>>>> 
>>>>> Hi
>>>>> 
>>>>> Thanks in advance for the help. If you could share the pgpool-II log
>>>>> when the stuck connection happens that would help us in identifiny and
>>>>> rectifing the problem.
>>>>> 
>>>>> Thanks
>>>>> Best regards
>>>>> Muhammad Usama
>>>>> 
>>>>> 
>>>>> On Mon, Feb 8, 2016 at 11:36 AM, Paweł Ufnalewski <archon at foap.com>
>>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>>      just to let you know - I'm having the same problem with 3.4.4
>>>>> version
>>>>> (DISCARD ALL appears slower than in 3.4.3 I think, but it still does).
>>>>> How
>>>>> can I help to fix this problem?
>>>>> 
>>>>> Best regards,
>>>>> Paweł Ufnalewski
>>>>> Infrastructure Architect at Foap.com
>>>>> 
>>>>> W dniu 2016-02-01 o 08:44, Muhammad Usama pisze:
>>>>> 
>>>>> Hi Gerhard
>>>>> 
>>>>> Many thanks for testing and pointing this out. It's unfortunate that you
>>>>> are
>>>>> still getting the stuck connection issue. If it is possible can you
>>>>> please
>>>>> share the pgpool-II log for the time when this stuck connection issue
>>>>> happens. I am more interested in seeing which exact error message that
>>>>> caused the child process to jump to error handler from where the child
>>>>> process proceeded to send the DISCARD ALL to backend and eventually got
>>>>> stuck. Since after many tries we are not able to reproduce this issue, so
>>>>> log would be really helpful in understanding and fixing the problem.
>>>>> 
>>>>> Best regards
>>>>> Muhammad Usama
>>>>> 
>>>>> 
>>>>> On Sun, Jan 31, 2016 at 9:33 PM, Gerhard Wiesinger <lists at wiesinger.com>
>>>>> wrote:
>>>>> 
>>>>> On 28.01.2016 01:10, Tatsuo Ishii wrote:
>>>>> 
>>>>> On 21.01.2016 20:52, Muhammad Usama wrote:
>>>>> 
>>>>> Hi
>>>>> 
>>>>> I am looking into this issue. and unfortunately like Ishii-San I am
>>>>> also not able to reproduce it. But I found one issue in 3.4 that might
>>>>> cause the problem. Can you please try the attached patch if it solves
>>>>> the problem. Also, if the problem still persists, it would be really
>>>>> helpful if you could share the pgpool-II log.
>>>>> 
>>>>> I looked at the patch but it includes only logging changes and no
>>>>> functional changes. Therefore I didn't test it. Do you expect and
>>>>> behavioral changes to fix it, and why?
>>>>> 
>>>>> elog() is not only a logging function, but also it plays very
>>>>> important role including exception handling and error treatments in
>>>>> pgpool-II. If you are familiar with PostgreSQL internals, you may
>>>>> notice it (elog() was imported from PostgreSQL source tree).
>>>>> 
>>>>> Tried version 3.5.0 where the patch is included. Still not working. See
>>>>> backtrace below.
>>>>> 
>>>>> Reverting to 3.3.7 which works perfectly.
>>>>> 
>>>>> Ciao,
>>>>> Gerhard
>>>>> 
>>>>> (gdb) back
>>>>> #0  0x00007fd87fdb6d43 in __select_nocancel () from /lib64/libc.so.6
>>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610) at
>>>>> protocol/pool_process_query.c:635
>>>>> #2  0x0000564471af1976 in pool_check_fd (cp=cp at entry=0x564473dfa610) at
>>>>> protocol/pool_process_query.c:657
>>>>> #3  0x0000564471b1f67b in pool_read (cp=0x564473dfa610,
>>>>> buf=buf at entry=0x7ffc1d71bf97, len=len at entry=1) at utils/pool_stream.c:162
>>>>> #4  0x0000564471af8e6e in read_kind_from_backend
>>>>> (frontend=frontend at entry=0x564473df3e60,
>>>>> backend=backend at entry=0x564473df2e00,
>>>>>      decided_kind=decided_kind at entry=0x7ffc1d71c397 "E") at
>>>>> protocol/pool_process_query.c:3234
>>>>> #5  0x0000564471affdc9 in ProcessBackendResponse
>>>>> (frontend=frontend at entry=0x564473df3e60,
>>>>> backend=backend at entry=0x564473df2e00, state=state at entry=0x7ffc1d71c41c,
>>>>>      num_fields=num_fields at entry=0x7ffc1d71c41a) at
>>>>> protocol/pool_proto_modules.c:2356
>>>>> #6  0x0000564471af5b15 in pool_process_query (frontend=0x564473df3e60,
>>>>> backend=0x564473df2e00, reset_request=reset_request at entry=1) at
>>>>> protocol/pool_process_query.c:302
>>>>> #7  0x0000564471aed98c in backend_cleanup (backend=<optimized out>,
>>>>> frontend_invalid=frontend_invalid at entry=0 '\000', frontend=0x564471e09e40
>>>>> <child_frontend>)
>>>>>      at protocol/child.c:437
>>>>> #8  0x0000564471af0637 in do_child (fds=fds at entry=0x564473dee030) at
>>>>> protocol/child.c:234
>>>>> #9  0x0000564471ace107 in fork_a_child (fds=0x564473dee030, id=8) at
>>>>> main/pgpool_main.c:678
>>>>> #10 0x0000564471aceb6d in reaper () at main/pgpool_main.c:2254
>>>>> #11 0x0000564471ad322b in PgpoolMain (discard_status=<optimized out>,
>>>>> clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:429
>>>>> #12 0x0000564471acc7b1 in main (argc=<optimized out>,
>>>>> argv=0x7ffc1d7219e8)
>>>>> at main/main.c:310
>>>>> 
>>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610) at
>>>>> protocol/pool_process_query.c:635
>>>>> 635                     fds = select(fd+1, &readmask, NULL, &exceptmask,
>>>>> timeoutp);
>>>>> 
>>>>> (gdb) print fd
>>>>> $1 = 8
>>>>> (gdb) print readmask
>>>>> $2 = {fds_bits = {256, 0 <repeats 15 times>}}
>>>>> (gdb) print exceptmask
>>>>> $3 = {fds_bits = {256, 0 <repeats 15 times>}}
>>>>> (gdb) print timeoutp
>>>>> $4 = (struct timeval *) 0x0
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> pgpool-general mailing list
>>>>> pgpool-general at pgpool.net
>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> 


More information about the pgpool-general mailing list