[pgpool-general: 4416] Re: Pgpool - connection hangs in DISCARD ALL

Paweł Ufnalewski archon at foap.com
Tue Feb 9 00:49:35 JST 2016


Hi,

     It looks like it hangs in this places (see attachment). Problem is, 
that developer responsible for app has changed something in code, so 
connections now closes properly from client side (before I got a lot of 
these errors:

2016-02-08 09:33:39: pid 8472: ERROR:  unable to read data from frontend
2016-02-08 09:33:39: pid 8472: DETAIL:  EOF encountered with frontend)
.

Best regards,
Paweł Ufnalewski
Infrastructure Architect at Foap.com

W dniu 2016-02-08 o 09:00, Muhammad Usama pisze:
> Hi
>
> Thanks in advance for the help. If you could share the pgpool-II log
> when the stuck connection happens that would help us in identifiny and
> rectifing the problem.
>
> Thanks
> Best regards
> Muhammad Usama
>
>
> On Mon, Feb 8, 2016 at 11:36 AM, Paweł Ufnalewski <archon at foap.com> wrote:
>> Hi,
>>
>>      just to let you know - I'm having the same problem with 3.4.4 version (DISCARD ALL appears slower than in 3.4.3 I think, but it still does). How can I help to fix this problem?
>>
>> Best regards,
>> Paweł Ufnalewski
>> Infrastructure Architect at Foap.com
>>
>> W dniu 2016-02-01 o 08:44, Muhammad Usama pisze:
>>
>> Hi Gerhard
>>
>> Many thanks for testing and pointing this out. It's unfortunate that you are still getting the stuck connection issue. If it is possible can you please share the pgpool-II log for the time when this stuck connection issue happens. I am more interested in seeing which exact error message that caused the child process to jump to error handler from where the child process proceeded to send the DISCARD ALL to backend and eventually got stuck. Since after many tries we are not able to reproduce this issue, so log would be really helpful in understanding and fixing the problem.
>>
>> Best regards
>> Muhammad Usama
>>
>>
>> On Sun, Jan 31, 2016 at 9:33 PM, Gerhard Wiesinger <lists at wiesinger.com> wrote:
>>> On 28.01.2016 01:10, Tatsuo Ishii wrote:
>>>>> On 21.01.2016 20:52, Muhammad Usama wrote:
>>>>>> Hi
>>>>>>
>>>>>> I am looking into this issue. and unfortunately like Ishii-San I am
>>>>>> also not able to reproduce it. But I found one issue in 3.4 that might
>>>>>> cause the problem. Can you please try the attached patch if it solves
>>>>>> the problem. Also, if the problem still persists, it would be really
>>>>>> helpful if you could share the pgpool-II log.
>>>>>>
>>>>> I looked at the patch but it includes only logging changes and no
>>>>> functional changes. Therefore I didn't test it. Do you expect and
>>>>> behavioral changes to fix it, and why?
>>>> elog() is not only a logging function, but also it plays very
>>>> important role including exception handling and error treatments in
>>>> pgpool-II. If you are familiar with PostgreSQL internals, you may
>>>> notice it (elog() was imported from PostgreSQL source tree).
>>>
>>> Tried version 3.5.0 where the patch is included. Still not working. See backtrace below.
>>>
>>> Reverting to 3.3.7 which works perfectly.
>>>
>>> Ciao,
>>> Gerhard
>>>
>>> (gdb) back
>>> #0  0x00007fd87fdb6d43 in __select_nocancel () from /lib64/libc.so.6
>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610) at protocol/pool_process_query.c:635
>>> #2  0x0000564471af1976 in pool_check_fd (cp=cp at entry=0x564473dfa610) at protocol/pool_process_query.c:657
>>> #3  0x0000564471b1f67b in pool_read (cp=0x564473dfa610, buf=buf at entry=0x7ffc1d71bf97, len=len at entry=1) at utils/pool_stream.c:162
>>> #4  0x0000564471af8e6e in read_kind_from_backend (frontend=frontend at entry=0x564473df3e60, backend=backend at entry=0x564473df2e00,
>>>      decided_kind=decided_kind at entry=0x7ffc1d71c397 "E") at protocol/pool_process_query.c:3234
>>> #5  0x0000564471affdc9 in ProcessBackendResponse (frontend=frontend at entry=0x564473df3e60, backend=backend at entry=0x564473df2e00, state=state at entry=0x7ffc1d71c41c,
>>>      num_fields=num_fields at entry=0x7ffc1d71c41a) at protocol/pool_proto_modules.c:2356
>>> #6  0x0000564471af5b15 in pool_process_query (frontend=0x564473df3e60, backend=0x564473df2e00, reset_request=reset_request at entry=1) at protocol/pool_process_query.c:302
>>> #7  0x0000564471aed98c in backend_cleanup (backend=<optimized out>, frontend_invalid=frontend_invalid at entry=0 '\000', frontend=0x564471e09e40 <child_frontend>)
>>>      at protocol/child.c:437
>>> #8  0x0000564471af0637 in do_child (fds=fds at entry=0x564473dee030) at protocol/child.c:234
>>> #9  0x0000564471ace107 in fork_a_child (fds=0x564473dee030, id=8) at main/pgpool_main.c:678
>>> #10 0x0000564471aceb6d in reaper () at main/pgpool_main.c:2254
>>> #11 0x0000564471ad322b in PgpoolMain (discard_status=<optimized out>, clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:429
>>> #12 0x0000564471acc7b1 in main (argc=<optimized out>, argv=0x7ffc1d7219e8) at main/main.c:310
>>>
>>> #1  0x0000564471af16a1 in pool_check_fd (cp=cp at entry=0x564473dfa610) at protocol/pool_process_query.c:635
>>> 635                     fds = select(fd+1, &readmask, NULL, &exceptmask, timeoutp);
>>>
>>> (gdb) print fd
>>> $1 = 8
>>> (gdb) print readmask
>>> $2 = {fds_bits = {256, 0 <repeats 15 times>}}
>>> (gdb) print exceptmask
>>> $3 = {fds_bits = {256, 0 <repeats 15 times>}}
>>> (gdb) print timeoutp
>>> $4 = (struct timeval *) 0x0
>>>
>>>
>>
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20160208/090856d5/attachment.html>
-------------- next part --------------
2016-02-08 13:43:51: pid 11292: LOG:  fork a new child process with pid: 3535
2016-02-08 13:44:49: pid 11292: LOG:  child process with pid: 30366 exits with status 512
2016-02-08 13:44:49: pid 11292: LOG:  fork a new child process with pid: 3865
2016-02-08 13:44:59: pid 11292: LOG:  child process with pid: 12060 exits with status 512
2016-02-08 13:44:59: pid 11292: LOG:  fork a new child process with pid: 3896
2016-02-08 13:45:46: pid 11292: LOG:  child process with pid: 3468 exits with status 512
2016-02-08 13:45:46: pid 11292: LOG:  fork a new child process with pid: 4067
2016-02-08 13:46:16: pid 19421: LOG:  forwarding cancel request to backend
2016-02-08 13:46:16: pid 19421: DETAIL:  canceling backend pid:15418 key: 1266497356
2016-02-08 13:46:17: pid 19421: ERROR:  unable to to flush data to frontend
2016-02-08 13:46:17: pid 19421: DETAIL:  frontend error occured while waiting for backend reply
2016-02-08 13:46:33: pid 11292: LOG:  child process with pid: 13601 exits with status 512
2016-02-08 13:46:33: pid 11292: LOG:  fork a new child process with pid: 4262
2016-02-08 13:47:03: pid 11292: LOG:  child process with pid: 30819 exits with status 512
2016-02-08 13:47:03: pid 11292: LOG:  fork a new child process with pid: 4368
2016-02-08 13:48:04: pid 9526: LOG:  forwarding cancel request to backend
2016-02-08 13:48:04: pid 9526: DETAIL:  canceling backend pid:16000 key: 1780480853
2016-02-08 13:48:05: pid 9526: ERROR:  unable to to flush data to frontend
2016-02-08 13:48:05: pid 9526: DETAIL:  frontend error occured while waiting for backend reply
2016-02-08 13:48:05: pid 9526: LOG:  pool_send_and_wait: Error or notice message from backend: : DB node id: 2 backend pid: 16000 statement: " DISCARD ALL" message: "canceling statement due to user request"
2016-02-08 13:48:05: pid 9526: WARNING:  packet kind of backend 2 ['E'] does not match with master/majority nodes packet kind ['C']
2016-02-08 13:48:05: pid 9526: FATAL:  failed to read kind from backend
2016-02-08 13:48:05: pid 9526: DETAIL:  kind mismatch among backends. Possible last query was: " DISCARD ALL" kind details are: 0[C] 2[E: canceling statement due to user request]
2016-02-08 13:48:05: pid 9526: HINT:  check data consistency among db nodes
2016-02-08 13:48:05: pid 11292: LOG:  child process with pid: 9526 exits with status 512
2016-02-08 13:48:05: pid 11292: LOG:  fork a new child process with pid: 4607
2016-02-08 13:48:59: pid 2802: LOG:  forwarding cancel request to backend
2016-02-08 13:48:59: pid 2802: DETAIL:  canceling backend pid:25637 key: 889830127
2016-02-08 13:49:00: pid 2802: ERROR:  unable to to flush data to frontend
2016-02-08 13:49:00: pid 2802: DETAIL:  frontend error occured while waiting for backend reply
2016-02-08 13:49:00: pid 2802: LOG:  pool_send_and_wait: Error or notice message from backend: : DB node id: 1 backend pid: 25637 statement: " DISCARD ALL" message: "canceling statement due to user request"
2016-02-08 13:49:00: pid 2802: WARNING:  packet kind of backend 1 ['E'] does not match with master/majority nodes packet kind ['C']
2016-02-08 13:49:00: pid 2802: FATAL:  failed to read kind from backend
2016-02-08 13:49:00: pid 2802: DETAIL:  kind mismatch among backends. Possible last query was: " DISCARD ALL" kind details are: 0[C] 1[E: canceling statement due to user request]
2016-02-08 13:49:00: pid 2802: HINT:  check data consistency among db nodes
2016-02-08 13:49:00: pid 11292: LOG:  child process with pid: 2802 exits with status 512
2016-02-08 13:49:00: pid 11292: LOG:  fork a new child process with pid: 4818
2016-02-08 13:49:13: pid 10886: LOG:  forwarding cancel request to backend
2016-02-08 13:49:13: pid 10886: DETAIL:  canceling backend pid:15204 key: 49693571
2016-02-08 13:49:14: pid 10886: ERROR:  unable to to flush data to frontend
2016-02-08 13:49:14: pid 10886: DETAIL:  frontend error occured while waiting for backend reply
2016-02-08 13:49:14: pid 10886: LOG:  pool_send_and_wait: Error or notice message from backend: : DB node id: 1 backend pid: 15204 statement: " DISCARD ALL" message: "canceling statement due to user request"
2016-02-08 13:49:14: pid 10886: WARNING:  packet kind of backend 1 ['E'] does not match with master/majority nodes packet kind ['C']
2016-02-08 13:49:14: pid 10886: FATAL:  failed to read kind from backend
2016-02-08 13:49:14: pid 10886: DETAIL:  kind mismatch among backends. Possible last query was: " DISCARD ALL" kind details are: 0[C] 1[E: canceling statement due to user request]
2016-02-08 13:49:14: pid 10886: HINT:  check data consistency among db nodes
2016-02-08 13:49:14: pid 11292: LOG:  child process with pid: 10886 exits with status 512
2016-02-08 13:49:14: pid 11292: LOG:  fork a new child process with pid: 4870
2016-02-08 13:50:42: pid 11292: LOG:  child process with pid: 2856 exits with status 512
2016-02-08 13:50:42: pid 11292: LOG:  fork a new child process with pid: 5292
2016-02-08 13:50:43: pid 11292: LOG:  child process with pid: 4607 exits with status 512
2016-02-08 13:50:43: pid 11292: LOG:  fork a new child process with pid: 5293
2016-02-08 13:50:58: pid 19421: LOG:  pool_send_and_wait: Error or notice message from backend: : DB node id: 2 backend pid: 15418 statement: " DISCARD ALL" message: "canceling statement due to user request"
2016-02-08 13:50:58: pid 19421: WARNING:  packet kind of backend 2 ['E'] does not match with master/majority nodes packet kind ['C']
2016-02-08 13:50:58: pid 19421: FATAL:  failed to read kind from backend
2016-02-08 13:50:58: pid 19421: DETAIL:  kind mismatch among backends. Possible last query was: " DISCARD ALL" kind details are: 0[C] 2[E: canceling statement due to user request]
2016-02-08 13:50:58: pid 19421: HINT:  check data consistency among db nodes
2016-02-08 13:50:58: pid 11292: LOG:  child process with pid: 19421 exits with status 512
2016-02-08 13:50:58: pid 11292: LOG:  fork a new child process with pid: 5354
2016-02-08 16:07:55: pid 1488: LOG:  pool_send_and_wait: Error or notice message from backend: : DB node id: 2 backend pid: 3516 statement: "<NORMAL SELECT STATEMENT DELETED FOR PRIVACY>" message: "canceling statement due to conflict with recovery"
2016-02-08 16:10:43: pid 11292: LOG:  child process with pid: 4469 exits with status 512
2016-02-08 16:10:43: pid 11292: LOG:  fork a new child process with pid: 5623
2016-02-08 16:10:44: pid 11292: LOG:  child process with pid: 4579 exits with status 512
2016-02-08 16:10:44: pid 11292: LOG:  fork a new child process with pid: 5624
2016-02-08 16:15:12: pid 1484: LOG:  forwarding cancel request to backend
2016-02-08 16:15:12: pid 1484: DETAIL:  canceling backend pid:10472 key: 1673287368
2016-02-08 16:15:13: pid 1484: ERROR:  unable to to flush data to frontend
2016-02-08 16:15:13: pid 1484: DETAIL:  frontend error occured while waiting for backend reply
2016-02-08 16:15:13: pid 1484: LOG:  pool_send_and_wait: Error or notice message from backend: : DB node id: 0 backend pid: 10472 statement: " DISCARD ALL" message: "canceling statement due to user request"
2016-02-08 16:15:13: pid 1484: LOG:  SimpleQuery: received query cancel error message from master node. query:  DISCARD ALL



More information about the pgpool-general mailing list