[pgpool-general: 4541] Re: Pgpool - connection hangs in DISCARD ALL

Tatsuo Ishii ishii at postgresql.org
Sat Mar 12 18:11:45 JST 2016


Yes, we plan to make minor releases for 3.1 to 3.5 by the end of this
month.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Can you make new releases, at least for 3.4 and 3.5 branches to get
> them upstream.
> 
> Thnx.
> 
> Ciao,
> Gerhard
> 
> -- https://www.wiesinger.com/
> 
> 
> On 23.02.2016 13:27, Muhammad Usama wrote:
>> On Tue, Feb 23, 2016 at 4:16 AM, Tatsuo Ishii <ishii at postgresql.org>
>> wrote:
>>> Usama,
>>>
>>> Doesn't pgpool-II 3.1 have the same problem?
>> Sorry, I missed that and 3_0 aswell. I have pushed the same change to
>> the both branches.
>>
>> Regards
>> Muhammad Usama
>>
>>
>>> Best regards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese:http://www.sraoss.co.jp
>>>
>>>> Hi
>>>>
>>>> I have pushed the above fix in all branches from pgpool-II 3.2
>>>> onwards.
>>>> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=6afacb1b19603b37e3d005963182258b9f4fca49
>>>>
>>>> Thanks again for your help in verifying and testing the fix.
>>>>
>>>> Kind regards
>>>> Muhammad Usama
>>>>
>>>>
>>>> On Sat, Feb 20, 2016 at 1:03 AM, Muhammad Usama <m.usama at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> Many thanks for the confirmation, the fix needs to go in all branches
>>>>> and
>>>>> I will push it in the morning.
>>>>>
>>>>> Regards
>>>>> Muhammad Usama
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>>> On 19-Feb-2016, at 11:52 PM, Gerhard Wiesinger <lists at wiesinger.com>
>>>>> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Can confirm that this patch worked for me (tested the 3.5 patch
>>>>> version), nearly 2 days without any problem. Can you please add it to
>>>>> the
>>>>> git repo and make a new release (3.4, 3.5).
>>>>>> Thnx.
>>>>>>
>>>>>> Ciao,
>>>>>> Gerhard
>>>>>>
>>>>>>
>>>>>>> On 16.02.2016 11:44, Muhammad Usama wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> Many thanks for the reply and a good news that you are not getting
>>>>>>> stuck connection issue after the patch.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Best regards
>>>>>>> Muhammad Usama
>>>>>>>
>>>>>>>
>>>>>>>> On Fri, Feb 12, 2016 at 9:45 PM, Paweł Ufnalewski <archon at foap.com>
>>>>> wrote:
>>>>>>>> Hmm it looks like it's fine now. Right now I only see these in log:
>>>>>>>>
>>>>>>>> 2016-02-12 17:28:12: pid 8838: LOG:  child process with pid: 27299
>>>>> exits
>>>>>>>> with status 256
>>>>>>>> 2016-02-12 17:28:12: pid 8838: LOG:  fork a new child process with
>>>>> pid: 6140
>>>>>>>> 2016-02-12 17:30:42: pid 8838: LOG:  child process with pid: 5571
>>>>> exits with
>>>>>>>> status 512
>>>>>>>> 2016-02-12 17:30:42: pid 8838: LOG:  fork a new child process with
>>>>> pid: 6720
>>>>>>>> 2016-02-12 17:30:43: pid 8838: LOG:  child process with pid: 4444
>>>>> exits with
>>>>>>>> status 512
>>>>>>>> 2016-02-12 17:30:43: pid 8838: LOG:  fork a new child process with
>>>>> pid: 6751
>>>>>>>> 2016-02-12 17:35:42: pid 8838: LOG:  child process with pid: 6140
>>>>> exits with
>>>>>>>> status 512
>>>>>>>> 2016-02-12 17:35:42: pid 8838: LOG:  fork a new child process with
>>>>> pid: 7868
>>>>>>>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6751
>>>>> exits with
>>>>>>>> status 512
>>>>>>>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with
>>>>> pid: 9018
>>>>>>>> 2016-02-12 17:40:42: pid 8838: LOG:  child process with pid: 6720
>>>>> exits with
>>>>>>>> status 512
>>>>>>>> 2016-02-12 17:40:42: pid 8838: LOG:  fork a new child process with
>>>>> pid: 9019
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Paweł Ufnalewski
>>>>>>>> Infrastructure Architect at Foap.com
>>>>>>>>
>>>>>>>> W dniu 2016-02-09 o 14:02, Muhammad Usama pisze:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> Many thanks for sharing the pgpool.log, The log shared by you does
>>>>>>>>> contains some error messages "ERROR: unable to to flush data to
>>>>>>>>> frontend" that have the potential to cause the stuck connection
>>>>>>>>> Can you please try out the attached patch if it fix the problem. I am
>>>>>>>>> attaching the patches for both 3_5 and 3_4 branches, please use the
>>>>>>>>> respective patch as per your setup. Hopefully this should fix the
>>>>>>>>> stuck issue.
>>>>>>>>>
>>>>>>>>> Kind regards
>>>>>>>>> Muhammad Usama
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Mon, Feb 8, 2016 at 8:49 PM, Paweł Ufnalewski <archon at foap.com>
>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>       It looks like it hangs in this places (see attachment). Problem
>>>>> is,
>>>>>>>>>> that
>>>>>>>>>> developer responsible for app has changed something in code, so
>>>>>>>>>> connections
>>>>>>>>>> now closes properly from client side (before I got a lot of these
>>>>> errors:
>>>>>>>>>> 2016-02-08 09:33:39: pid 8472: ERROR:  unable to read data from
>>>>> frontend
>>>>>>>>>> 2016-02-08 09:33:39: pid 8472: DETAIL:  EOF encountered with
>>>>> frontend)
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Paweł Ufnalewski
>>>>>>>>>> Infrastructure Architect at Foap.com
>>>>>>>>>>
>>>>>>>>>> W dniu 2016-02-08 o 09:00, Muhammad Usama pisze:
>>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> Thanks in advance for the help. If you could share the pgpool-II log
>>>>>>>>>> when the stuck connection happens that would help us in identifiny
>>>>> and
>>>>>>>>>> rectifing the problem.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Best regards
>>>>>>>>>> Muhammad Usama
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 8, 2016 at 11:36 AM, Paweł Ufnalewski <archon at foap.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>       just to let you know - I'm having the same problem with 3.4.4
>>>>>>>>>> version
>>>>>>>>>> (DISCARD ALL appears slower than in 3.4.3 I think, but it still
>>>>> does).
>>>>>>>>>> How
>>>>>>>>>> can I help to fix this problem?
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Paweł Ufnalewski
>>>>>>>>>> Infrastructure Architect at Foap.com
>>>>>>>>>>
>>>>>>>>>> W dniu 2016-02-01 o 08:44, Muhammad Usama pisze:
>>>>>>>>>>
>>>>>>>>>> Hi Gerhard
>>>>>>>>>>
>>>>>>>>>> Many thanks for testing and pointing this out. It's unfortunate that
>>>>> you
>>>>>>>>>> are
>>>>>>>>>> still getting the stuck connection issue. If it is possible can you
>>>>>>>>>> please
>>>>>>>>>> share the pgpool-II log for the time when this stuck connection issue
>>>>>>>>>> happens. I am more interested in seeing which exact error message
>>>>> that
>>>>>>>>>> caused the child process to jump to error handler from where the
>>>>> child
>>>>>>>>>> process proceeded to send the DISCARD ALL to backend and eventually
>>>>> got
>>>>>>>>>> stuck. Since after many tries we are not able to reproduce this
>>>>> issue, so
>>>>>>>>>> log would be really helpful in understanding and fixing the problem.
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>> Muhammad Usama
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Jan 31, 2016 at 9:33 PM, Gerhard Wiesinger <
>>>>> lists at wiesinger.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 28.01.2016 01:10, Tatsuo Ishii wrote:
>>>>>>>>>>
>>>>>>>>>> On 21.01.2016 20:52, Muhammad Usama wrote:
>>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> I am looking into this issue. and unfortunately like Ishii-San I am
>>>>>>>>>> also not able to reproduce it. But I found one issue in 3.4 that
>>>>> might
>>>>>>>>>> cause the problem. Can you please try the attached patch if it solves
>>>>>>>>>> the problem. Also, if the problem still persists, it would be really
>>>>>>>>>> helpful if you could share the pgpool-II log.
>>>>>>>>>>
>>>>>>>>>> I looked at the patch but it includes only logging changes and no
>>>>>>>>>> functional changes. Therefore I didn't test it. Do you expect and
>>>>>>>>>> behavioral changes to fix it, and why?
>>>>>>>>>>
>>>>>>>>>> elog() is not only a logging function, but also it plays very
>>>>>>>>>> important role including exception handling and error treatments in
>>>>>>>>>> pgpool-II. If you are familiar with PostgreSQL internals, you may
>>>>>>>>>> notice it (elog() was imported from PostgreSQL source tree).
>>>>>>>>>>
>>>>>>>>>> Tried version 3.5.0 where the patch is included. Still not working.
>>>>> See
>>>>>>>>>> backtrace below.
>>>>>>>>>>
>>>>>>>>>> Reverting to 3.3.7 which works perfectly.
>>>>>>>>>>
>>>>>>>>>> Ciao,
>>>>>>>>>> Gerhard
>>>>>>>>>>
>>>>>>>>>> (gdb) back
>>>>>>>>>> #0  0x00007fd87fdb6d43 in __select_nocancel () from /lib64/libc.so.6
>>>>>>>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610)
>>>>> at
>>>>>>>>>> protocol/pool_process_query.c:635
>>>>>>>>>> #2  0x0000564471af1976 in pool_check_fd (cp=cp at entry=0x564473dfa610)
>>>>> at
>>>>>>>>>> protocol/pool_process_query.c:657
>>>>>>>>>> #3  0x0000564471b1f67b in pool_read (cp=0x564473dfa610,
>>>>>>>>>> buf=buf at entry=0x7ffc1d71bf97, len=len at entry=1) at
>>>>> utils/pool_stream.c:162
>>>>>>>>>> #4  0x0000564471af8e6e in read_kind_from_backend
>>>>>>>>>> (frontend=frontend at entry=0x564473df3e60,
>>>>>>>>>> backend=backend at entry=0x564473df2e00,
>>>>>>>>>>       decided_kind=decided_kind at entry=0x7ffc1d71c397 "E") at
>>>>>>>>>> protocol/pool_process_query.c:3234
>>>>>>>>>> #5  0x0000564471affdc9 in ProcessBackendResponse
>>>>>>>>>> (frontend=frontend at entry=0x564473df3e60,
>>>>>>>>>> backend=backend at entry=0x564473df2e00, state=state at entry
>>>>> =0x7ffc1d71c41c,
>>>>>>>>>>       num_fields=num_fields at entry=0x7ffc1d71c41a) at
>>>>>>>>>> protocol/pool_proto_modules.c:2356
>>>>>>>>>> #6  0x0000564471af5b15 in pool_process_query
>>>>> (frontend=0x564473df3e60,
>>>>>>>>>> backend=0x564473df2e00, reset_request=reset_request at entry=1) at
>>>>>>>>>> protocol/pool_process_query.c:302
>>>>>>>>>> #7  0x0000564471aed98c in backend_cleanup (backend=<optimized out>,
>>>>>>>>>> frontend_invalid=frontend_invalid at entry=0 '\000',
>>>>> frontend=0x564471e09e40
>>>>>>>>>> <child_frontend>)
>>>>>>>>>>       at protocol/child.c:437
>>>>>>>>>> #8  0x0000564471af0637 in do_child (fds=fds at entry=0x564473dee030) at
>>>>>>>>>> protocol/child.c:234
>>>>>>>>>> #9  0x0000564471ace107 in fork_a_child (fds=0x564473dee030, id=8) at
>>>>>>>>>> main/pgpool_main.c:678
>>>>>>>>>> #10 0x0000564471aceb6d in reaper () at main/pgpool_main.c:2254
>>>>>>>>>> #11 0x0000564471ad322b in PgpoolMain (discard_status=<optimized out>,
>>>>>>>>>> clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:429
>>>>>>>>>> #12 0x0000564471acc7b1 in main (argc=<optimized out>,
>>>>>>>>>> argv=0x7ffc1d7219e8)
>>>>>>>>>> at main/main.c:310
>>>>>>>>>>
>>>>>>>>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610)
>>>>> at
>>>>>>>>>> protocol/pool_process_query.c:635
>>>>>>>>>> 635                     fds = select(fd+1, &readmask, NULL,
>>>>> &exceptmask,
>>>>>>>>>> timeoutp);
>>>>>>>>>>
>>>>>>>>>> (gdb) print fd
>>>>>>>>>> $1 = 8
>>>>>>>>>> (gdb) print readmask
>>>>>>>>>> $2 = {fds_bits = {256, 0 <repeats 15 times>}}
>>>>>>>>>> (gdb) print exceptmask
>>>>>>>>>> $3 = {fds_bits = {256, 0 <repeats 15 times>}}
>>>>>>>>>> (gdb) print timeoutp
>>>>>>>>>> $4 = (struct timeval *) 0x0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> pgpool-general mailing list
>>>>>>>>>> pgpool-general at pgpool.net
>>>>>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>>>>> _______________________________________________
>>>>>>> pgpool-general mailing list
>>>>>>> pgpool-general at pgpool.net
>>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> 


More information about the pgpool-general mailing list