[pgpool-committers: 9486] pgpool: Mitigate session disconnection issue in failover/failback/backe

Tatsuo Ishii ishii at sraoss.co.jp
Tue Jul 18 10:36:54 JST 2023


Mitigate session disconnection issue in failover/failback/backend error.

Previously Pgpool-II disconnected client sessions in various
cases. This commit tries to avoid some of cases, especially when a
backend goes down and the backend is not either primary (or main node)
nor load balance node.

Suppose we have 3 streaming replication PostgreSQL cluster and the
client uses primary (node 0) and standby 1 (node 1), but does not use
standby 2 (node 2) because the node 2 is not load balance node.  In
this case ideally shutting down node 2 should not disconnect the
session. However the session is disconnected if the session processing
a query while failover.  The reason why session disconnection in
failover is necessary is, there are bunch of places in the source code
something like this:

for (i = 0; i < NUM_BACKENDS; i++)
{
        if (!VALID_BACKEND(i))
           continue;
           :
           :

VALID_BACKEND returns true if the backend is not in down status. If
this code is executed while failover, the code may access the backend
socket which is not available any more and will cause troubles
including segfault. So inside VALID_BACKEND, we check whether failover
is performed, and if so, the pgpool child process exits and the
session disconnects. To aovid it, change VALID_BACKEND so that it
waits for completion of failover. For this purpose new function
wait_for_failover_to_finish() is added. It waits for the completion of
failover up to MAX_FAILOVER_WAIT seconds (for it's fixed to 30).  The
change above will prevent unnecessary session disconnection for
existing sessions.

This commit also tries to prevent unnecessary session disconnection
while accepting new sessions. There are multiple places where it could
happen and this commit fixes them:

- accepting new connection from client. In wait_for_new_connections,
  call wait_for_failover_to_finish to wait for completion of
  failover.

- creating new connection to backend. After accepting connection
  request from client and before creating connection to backend, call
  wait_for_failover_to_finish to wait for completion of failover.

- fixing broken socket. pool_get_cp checks whether exiting backend
  connection is broken. If it's broken, sleep 1 second to expect
  failover happens then calls wait_for_failover_to_finish().

- processing an application name. If an application name is included
  in a startup message, pgpool sends query like "SET application_name
  TO foo" to all backend nodes including node 2, which could cause a
  write error. To prevent the error, I modified
  connect_using_existing_connection, which is sending the SET command
  using do_command, so that do_command does not raise an ERROR by
  wrapping it in TRY/CATCH block.

Note that even with all fixes above, I was not able to fix some cases
where pool_write raises error. pool_write is used to write to backend
socket and there are too many places to fix all of them. For now we
need to run "pcp_detach_node 2" before shutdown it. pcp_detach_node
will tell all pgpool child process that node 2 is going down. Even if
a child process does not notice it and writes to backend 2 socket,
there will be no error because node 2 is still alive.

Finally this commit adds new regression test case
037.failover_session.  For the test pgbench is used. There are 2 cases
for continuous session (without -C option) and repeating
connection/disconnection (with -C option) each. So there are 4 causes
in the test:

"=== test1: backend_weight2 = 0 and pgbench without -C option"
"=== test2: backend_weight2 = 0 and pgbench with -C option"
"=== test3: load_balance_mode = off and pgbench without -C option"
"=== test4: load_balance_mode = off and pgbench with -C option"

test2 and test4 requires pcp_detach_node before shutting down node 2.

Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2023-July/004352.html

Branch
------
master

Details
-------
https://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=4aa657e055250da9db9a4c5cde7260e8f24707cb

Modified Files
--------------
src/context/pool_query_context.c                   |  48 +++++++-
src/include/context/pool_query_context.h           |   4 +-
src/protocol/child.c                               |  39 ++++++-
src/protocol/pool_connection_pool.c                |  10 +-
src/protocol/pool_process_query.c                  |   2 +-
.../regression/tests/037.failover_session/test.sh  | 122 +++++++++++++++++++++
6 files changed, 213 insertions(+), 12 deletions(-)



More information about the pgpool-committers mailing list