[pgpool-hackers: 4352] Re: Proposal: mitigating session disconnection issue while failover

Sat Jul 8 19:16:42 JST 2023

> Background:
> 
> Currently Pgpool-II disconnects client sessions in failover or backend
> error. This is fine because the client needs to access the PostgreSQL
> backend anyway. But even in the case when the client does not use
> particular backend, the client session is disconnected in
> failover. This is not good. Suppose we have 3 streaming replication
> PostgreSQL cluster and the client uses primary (node 0) and standby 1
> (node 1), but does not use standby 2 (node 2). In this case ideally
> shutting down node 2 should not disconnect the session. However the
> session is disconnected if the session sends query to Pgpool-II while
> failover. This is necessary because there are bunch of places in the
> source code something like this:
> 
> for (i = 0; i < NUM_BACKENDS; i++)
> {
> 	if (!VALID_BACKEND(i))
> 	   continue;
> 	   :
> 	   :
> 
> Here, NUM_BACKENDS represents the number of PostgreSQL backends (in
> the case above it's 3). VALID_BACKEND returns true if the backend is
> not in down status. If this code is executed while failover, the code
> may access the backend socket which is not available any more and will
> cause troubles including segfault. So inside VALID_BACKEND, we check
> whether failover is performed, and if so, the pgpool child process
> exits and the session disconnects.
> 
> Proposal:
> 
> In this proposal I would like to mitigate the issue above in certain
> cases.  This proposal does not resolve all cases. Still some session
> disconnection cases will remain. In all cases the precondition is, the
> client does not use the backend which is the target of failover. To
> make the proposal easier to understand, I supopose the session uses
> only node 0 and/or node 1, and the failover target is node 2. Usually
> to make this possible, we need to set backend_weight2 = 0 or
> load_balance_mode = off.
> 
> case 1:
> 
> Failover on node 2 occurs while the session keeps on sending queries
> to node 0 and/or 1. Change VALID_BACKEND so that it waits for
> completion of failover. For this purpose new function
> wait_for_failover_to_finish() is added. It waits for the completion of
> failover up to MAX_FAILOVER_WAIT seconds (for now it's 30). It maybe
> better to make the wait time configurable. Thoughts?
> 
> case 2:
> 
> Failover on node 2 occurs while not only the session keep on sending
> queries to node 0 and/or 1, but new session is created. This is much
> harder than case 1. There are multiple places where session
> disconnection could occur.
> 
> - accepting new connection from client. In wait_for_new_connections,
>   call wait_for_failover_to_finish to wait for completion of
>   failover.
> 
> - creating new connection to backend. After accepting connection
>   request from client and before creating connection to backend, call
>   wait_for_failover_to_finish to wait for completion of failover.
> 
> - fixing broken socket. pool_get_cp checks whether exiting backend
>   connection is broken. If it's broken, sleep 1 second to expect
>   failover happens then calls wait_for_failover_to_finish().
> 
> - processing an application name. If an application name is included
>   in a startup message, pgpool sends query like "SET application_name
>   TO foo" to all backend nodes including node 2, which could cause a
>   write error. To prevent the error, I modified
>   connect_using_existing_connection, which is sending the SET command
>   using do_command, so that do_command does not raise an ERROR by
>   wrapping it in TRY/CATCH block.
> 
> Note that even with all fixes above, I was not able to fix some cases
> where pool_write raises error. pool_write is used to write to backend
> socket and there are too many places to fix all of them. For now we
> need to run "pcp_detach_node 2" before shutdown it. pcp_detach_node
> will tell all pgpool child process that node 2 is going down. Even if
> a child process does not notice it and writes to backend 2 socket,
> there will be no error because node 2 is still alive.
> 
> Tests: I have created new test 079.failover_session for this
> feature. For the test pgbench is used. I can generate load using
> continuous session (without -C option) and repeating
> connection/disconnection (with -C option). There are 4 causes in the test:
> 
> "=== test1: backend_weight2 = 0 and pgbench without -C option"
> "=== test2: backend_weight2 = 0 and pgbench with -C option"
> "=== test3: load_balance_mode = off and pgbench without -C option"
> "=== test4: load_balance_mode = off and pgbench with -C option"
> 
> test2 and test4 requires pcp_detach_node before shutting down node 2.
> 
> Patch:
> 
> Attached is the v1 patch for this feature. Comments and suggestions
> are welcome.

I realized that in the v1 patch the new regression test is numbered as
079, which is wrong because anything beyond 50 is supposed to be
assigned to bug fixes. Attached v2 patch just changes the test number
from 079 to 037.

BTW, I think we need to provide some way to let users find whether
there is any session that uses the backend node planned to be down as
the load balance node so that users can know if it's safe to set the
node to down. I propose to add the info to pcp_process_info and show
pool_pools. It would be even better if we add new function to
pgpool_adm extension so that users can SELECT necessary information
from the output of pcp_process_info. However this should be posted as
a separate patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v2-0001-Avoid-session-disconnection.patch
Type: text/x-patch
Size: 11072 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20230708/e1de7d89/attachment.bin>