[pgpool-hackers: 3169] Re: Query hang issue with extended protocol

Thu Dec 6 22:41:48 JST 2018

Hi Ishii San

Thanks for looking into this, As discussed over the call please find the
attached pgpool log and stack trace for the stuck issue.
I am currently not able to trim down the test case so sorry for throwing a
huge log file.
There is no special configuration on PostgreSQL server side and this issue
only happens when the load balance node and the primary node are same
and as you can see from the stack trace the stuck happens in the
read_kind_from_backend() function and the reason for that is
the function is trying to read response from the backend node 1(standby)
while it has only sent the query to the backend node 0 (primary)

--stack-trace--
#0  0x00007f1775216c53 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:81
#1  0x000000000042def6 in pool_check_fd (cp=cp at entry=0x7f1776fb2328) at
protocol/pool_process_query.c:696
#2  0x000000000042e1c6 in pool_check_fd (cp=cp at entry=0x7f1776fb2328) at
protocol/pool_process_query.c:728
#3  0x000000000045f92c in pool_read (cp=0x7f1776fb2328,
buf=buf at entry=0x7ffebb5f2345,
len=len at entry=1) at utils/pool_stream.c:166
#4  0x0000000000432eab in read_kind_from_backend
(frontend=frontend at entry=0x2292e88,
backend=backend at entry=0x7f1776fad6d8,
decided_kind=decided_kind at entry=0x7ffebb5f274a
"D")
    at protocol/pool_process_query.c:3379
#5  0x000000000043f232 in ProcessBackendResponse
(frontend=frontend at entry=0x2292e88,
backend=backend at entry=0x7f1776fad6d8, state=state at entry=0x7ffebb5f27b4,
    num_fields=num_fields at entry=0x7ffebb5f27b2) at
protocol/pool_proto_modules.c:2653
#6  0x0000000000431d17 in pool_process_query (frontend=0x2292e88,
backend=0x7f1776fad6d8, reset_request=reset_request at entry=0) at
protocol/pool_process_query.c:320
#7  0x000000000042cbb9 in do_child (fds=fds at entry=0x2291370) at
protocol/child.c:387
#8  0x00000000004090a5 in fork_a_child (fds=0x2291370, id=2) at
main/pgpool_main.c:659
#9  0x000000000040fb73 in PgpoolMain (discard_status=discard_status at entry=0
'\000', clear_memcache_oidmaps=clear_memcache_oidmaps at entry=0 '\000') at
main/pgpool_main.c:378
#10 0x0000000000407777 in main (argc=<optimized out>, argv=<optimized out>)
at main/main.c:361

Thanks
Best Regards
Muhammad Usama

On Thu, Dec 6, 2018 at 3:52 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > Hi Ishii San
> >
> > I was investigating the issue faced by one of Pgpool-II user who reported
> > that after executing a certain transaction (large amount of data DELETE /
> > INSERT / UPDATE, COMMIT), the next execution of the BEGIN statement
> > hangs(using the extended query protocol).
> >
> > I think the problem is in pool_is_node_to_be_sent_in_current_query()
> > function. The issue is in streaming replication mode, when
> > pool_is_query_in_progress is not set, the function always returns true.
> Now
> > consider that we have two backend servers 1 primary and one standby. And
> > for a particular session the primary node gets selected as a load balance
> > node. So effectively that session should only be sending queries and
> > reading responses from primary server alone. But with the current
> > implementation of pool_is_node_to_be_sent_in_current_query() the
> > VALID_BACKEND macro will return true for standby node as well when the
> > query in progress flag is not set(which is wrong when standby node is not
> > selected for load balancing). and if this happens in let say
> > read_kind_from_backend() function we will get a stuck, since the
> > read_kind_from_backend() will be trying to read the response from standby
> > node while we have never sent any query to that node.
> >
> >
> > Can you please take a look at the attached patch, I am not sure if it
> will
> > have some side effects or not.
>
> Unfortunately the patch would not help and probably would break many
> cases which are working today.
>
> For the scenario you mentioned is already handled by
> pool_pending_message_* functions defined in
> context/query_context.c. When a message, for example parse, is sent
> from frontend and forwarded to backends, the info is recorded in an
> FIFO queue by pool_pending_message_add() along with info which of
> backend the message is forwarded to. When the parse message will be
> replied back with "parse complete message" from the backend,
> read_kind_message_from_backend() can know from which of backend it
> should read a message by looking at the head of the queue.  Once the
> message is retrieved (in this example parse complete message), an
> entry in the queue (in this example parse message) is pulled out and
> removed.
>
> This mechanism usually works well unless backend sends unpredicted
> message. Recently one of such a case was reported:
> https://www.pgpool.net/mantisbt/view.php?id=448
>
> In the report PostgreSQL sends an error message caused by
> idle_in_transaction_timeout setting which Pgpool-II does not expect.
>
> Does the user has any uncommon setting in postgresql.conf?
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20181206/acaec3f9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool.log.zip
Type: application/zip
Size: 97490 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20181206/acaec3f9/attachment-0001.zip>