[pgpool-hackers: 3173] Re: Query hang issue with extended protocol

Muhammad Usama m.usama at gmail.com
Fri Dec 7 20:13:29 JST 2018


Hi Ishii-San

Please find the attached the debug log and pgpool.conf

The output of the stack trace, session context and query context is
pasted below

*--STACK-TRACE--*
#0  0x00007f3d5d218c53 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:81
#1  0x000000000042e086 in pool_check_fd (cp=cp at entry=0x7f3d5efb43a0) at
protocol/pool_process_query.c:696
#2  0x000000000042e356 in pool_check_fd (cp=cp at entry=0x7f3d5efb43a0) at
protocol/pool_process_query.c:728
#3  0x000000000045fafc in pool_read (cp=0x7f3d5efb43a0,
buf=buf at entry=0x7fffbc4830b5,
len=len at entry=1) at utils/pool_stream.c:166
#4  0x000000000043303b in read_kind_from_backend
(frontend=frontend at entry=0x2314e98,
backend=backend at entry=0x7f3d5efaf750,
decided_kind=decided_kind at entry=0x7fffbc4834ba
"D")
    at protocol/pool_process_query.c:3379
#5  0x000000000043f402 in ProcessBackendResponse
(frontend=frontend at entry=0x2314e98,
backend=backend at entry=0x7f3d5efaf750, state=state at entry=0x7fffbc483524,
    num_fields=num_fields at entry=0x7fffbc483522) at
protocol/pool_proto_modules.c:2663
#6  0x0000000000431ea7 in pool_process_query (frontend=0x2314e98,
backend=0x7f3d5efaf750, reset_request=reset_request at entry=0) at
protocol/pool_process_query.c:320
#7  0x000000000042cd49 in do_child (fds=fds at entry=0x23134b0) at
protocol/child.c:388
#8  0x00000000004090a5 in fork_a_child (fds=0x23134b0, id=0) at
main/pgpool_main.c:659
#9  0x000000000040fb73 in PgpoolMain (discard_status=discard_status at entry=0
'\000', clear_memcache_oidmaps=clear_memcache_oidmaps at entry=0 '\000') at
main/pgpool_main.c:378
#10 0x0000000000407777 in main (argc=<optimized out>, argv=<optimized out>)
at main/main.c:349

*--SESSION CONTEXT--*
(gdb) print *session_context
$4 = {process_context = 0x77c260 <process_context_d>, frontend = 0x2314e98,
backend = 0x7f3d5efaf750, in_progress = 0 '\000',
doing_extended_query_message = 0 '\000',
  need_to_restore_where_to_send = 0 '\000', where_to_send_save = "\001",
'\000' <repeats 126 times>, command_success = 1 '\001', writing_transaction
= 0 '\000',
  failed_transaction = 0 '\000', skip_reading_from_backends = 0 '\000',
ignore_till_sync = 0 '\000', transaction_isolation = POOL_UNKNOWN,
query_context = 0x248d418,
  memory_context = 0x22f2558, uncompleted_message = 0x231aeb8, message_list
= {capacity = 64, size = 63, sent_messages = 0x2351f70},
load_balance_node_id = 0,
  mismatch_ntuples = 0 '\000', ntuples = {0, -1, 0 <repeats 126 times>},
reset_context = 0 '\000', query_cache_array = 0x0, num_selects = 0,
pending_messages = 0x231b020,
  previous_message_exists = 0 '\000', previous_message = {type =
POOL_EXECUTE, contents = 0x24704f0 "SS_13", contents_len = 5, query =
"COMMIT", '\000' <repeats 1017 times>,
    statement = "\000_46", '\000' <repeats 123 times>, portal = '\000'
<repeats 127 times>, is_rows_returned = 0 '\000', not_forward_to_frontend =
0 '\000', node_ids = {0, -1},
    query_context = 0x242ff68}, major = 3, minor = 0}

*--QEURY-TRACE--*
(gdb) print *session_context->query_context
$5 = {original_query = 0x248e0e8 "BEGIN", rewritten_query = 0x0,
original_length = 6, rewritten_length = -1, parse_tree = 0x248e038,
rewritten_parse_tree = 0x0,
  where_to_send = "\001", '\000' <repeats 126 times>,
virtual_master_node_id = 0, query_state = {POOL_UNPARSED <repeats 128
times>}, is_cache_safe = 0 '\000', temp_cache = 0x0,
  is_multi_statement = 0 '\000', dboid = 0, query_w_hex = 0x0,
is_parse_error = 0 '\000', num_original_params = -1,
pg_terminate_backend_conn = 0x0, skip_cache_commit = 0 '\000',
  memory_context = 0x7f3d5efbff20}

Thanks
Best Regards
Muhammad Usama


On Fri, Dec 7, 2018 at 3:49 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> I need:
>
> - exact Pgpool-II version
> - pgpool.conf
> - gdb print command result of *pool_session_context when Pgpool-II hangs
>
> Also a full debug log will be helpful.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Hi Ishii San
> >
> > Thanks for looking into this, As discussed over the call please find the
> > attached pgpool log and stack trace for the stuck issue.
> > I am currently not able to trim down the test case so sorry for throwing
> a
> > huge log file.
> > There is no special configuration on PostgreSQL server side and this
> issue
> > only happens when the load balance node and the primary node are same
> > and as you can see from the stack trace the stuck happens in the
> > read_kind_from_backend() function and the reason for that is
> > the function is trying to read response from the backend node 1(standby)
> > while it has only sent the query to the backend node 0 (primary)
> >
> > --stack-trace--
> > #0  0x00007f1775216c53 in __select_nocancel () at
> > ../sysdeps/unix/syscall-template.S:81
> > #1  0x000000000042def6 in pool_check_fd (cp=cp at entry=0x7f1776fb2328) at
> > protocol/pool_process_query.c:696
> > #2  0x000000000042e1c6 in pool_check_fd (cp=cp at entry=0x7f1776fb2328) at
> > protocol/pool_process_query.c:728
> > #3  0x000000000045f92c in pool_read (cp=0x7f1776fb2328,
> > buf=buf at entry=0x7ffebb5f2345,
> > len=len at entry=1) at utils/pool_stream.c:166
> > #4  0x0000000000432eab in read_kind_from_backend
> > (frontend=frontend at entry=0x2292e88,
> > backend=backend at entry=0x7f1776fad6d8,
> > decided_kind=decided_kind at entry=0x7ffebb5f274a
> > "D")
> >     at protocol/pool_process_query.c:3379
> > #5  0x000000000043f232 in ProcessBackendResponse
> > (frontend=frontend at entry=0x2292e88,
> > backend=backend at entry=0x7f1776fad6d8, state=state at entry=0x7ffebb5f27b4,
> >     num_fields=num_fields at entry=0x7ffebb5f27b2) at
> > protocol/pool_proto_modules.c:2653
> > #6  0x0000000000431d17 in pool_process_query (frontend=0x2292e88,
> > backend=0x7f1776fad6d8, reset_request=reset_request at entry=0) at
> > protocol/pool_process_query.c:320
> > #7  0x000000000042cbb9 in do_child (fds=fds at entry=0x2291370) at
> > protocol/child.c:387
> > #8  0x00000000004090a5 in fork_a_child (fds=0x2291370, id=2) at
> > main/pgpool_main.c:659
> > #9  0x000000000040fb73 in PgpoolMain (discard_status=discard_status at entry
> =0
> > '\000', clear_memcache_oidmaps=clear_memcache_oidmaps at entry=0 '\000') at
> > main/pgpool_main.c:378
> > #10 0x0000000000407777 in main (argc=<optimized out>, argv=<optimized
> out>)
> > at main/main.c:361
> >
> > Thanks
> > Best Regards
> > Muhammad Usama
> >
> >
> >
> > On Thu, Dec 6, 2018 at 3:52 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> > Hi Ishii San
> >> >
> >> > I was investigating the issue faced by one of Pgpool-II user who
> reported
> >> > that after executing a certain transaction (large amount of data
> DELETE /
> >> > INSERT / UPDATE, COMMIT), the next execution of the BEGIN statement
> >> > hangs(using the extended query protocol).
> >> >
> >> > I think the problem is in pool_is_node_to_be_sent_in_current_query()
> >> > function. The issue is in streaming replication mode, when
> >> > pool_is_query_in_progress is not set, the function always returns
> true.
> >> Now
> >> > consider that we have two backend servers 1 primary and one standby.
> And
> >> > for a particular session the primary node gets selected as a load
> balance
> >> > node. So effectively that session should only be sending queries and
> >> > reading responses from primary server alone. But with the current
> >> > implementation of pool_is_node_to_be_sent_in_current_query() the
> >> > VALID_BACKEND macro will return true for standby node as well when the
> >> > query in progress flag is not set(which is wrong when standby node is
> not
> >> > selected for load balancing). and if this happens in let say
> >> > read_kind_from_backend() function we will get a stuck, since the
> >> > read_kind_from_backend() will be trying to read the response from
> standby
> >> > node while we have never sent any query to that node.
> >> >
> >> >
> >> > Can you please take a look at the attached patch, I am not sure if it
> >> will
> >> > have some side effects or not.
> >>
> >> Unfortunately the patch would not help and probably would break many
> >> cases which are working today.
> >>
> >> For the scenario you mentioned is already handled by
> >> pool_pending_message_* functions defined in
> >> context/query_context.c. When a message, for example parse, is sent
> >> from frontend and forwarded to backends, the info is recorded in an
> >> FIFO queue by pool_pending_message_add() along with info which of
> >> backend the message is forwarded to. When the parse message will be
> >> replied back with "parse complete message" from the backend,
> >> read_kind_message_from_backend() can know from which of backend it
> >> should read a message by looking at the head of the queue.  Once the
> >> message is retrieved (in this example parse complete message), an
> >> entry in the queue (in this example parse message) is pulled out and
> >> removed.
> >>
> >> This mechanism usually works well unless backend sends unpredicted
> >> message. Recently one of such a case was reported:
> >> https://www.pgpool.net/mantisbt/view.php?id=448
> >>
> >> In the report PostgreSQL sends an error message caused by
> >> idle_in_transaction_timeout setting which Pgpool-II does not expect.
> >>
> >> Does the user has any uncommon setting in postgresql.conf?
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20181207/73f38698/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool_debug.log.zip
Type: application/zip
Size: 497385 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20181207/73f38698/attachment-0001.zip>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool.conf
Type: application/octet-stream
Size: 35510 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20181207/73f38698/attachment-0001.obj>


More information about the pgpool-hackers mailing list