[pgpool-hackers: 3999] Re: Disabling failover when backend goes down or backend process killed

Tatsuo Ishii ishii at sraoss.co.jp
Mon Aug 23 17:15:24 JST 2021


> Currently if backend node is shutdown by admin, Pgpool-II can trigger
> failover in case when a client is being connected to pgpool and one of
> following conditions are met:
> 
> 1) the backend node is primary server
> 
> 2) the backend node is not primary but load balance node is the backend node
> 
> Ok, this is fine because health check will detect the backend node is
> down and triggers failover anyway.
> 
> Problem is, the same error code from backend node is also raised when
> backend process is killed by either a signal or by
> pg_terminate_backend() function. This is annoying because despite the
> fact that the backend node is actually up and running, failover is
> triggered.
> 
> Recently pgpool handles pg_terminate_backend() in more sophisticated way
> to avoid the issue but this is not perfect. Still in certain cases
> (for example the argument to the function is not a constant) failover
> is triggered.
> 
> To overcome the problem, I would like to introduce a new switch called
> "enable_failover_on_backend_shutdown" for upcoming Pgpool-II 4.3.  If
> enable_failover_on_backend_shutdown is on, pgpool will behave as it is
> now. If it is off, pgpool will not trigger failover when admin
> shutdowns the backend node or backend process is killed. Instead the
> session corresponding to the backend process will be terminated.
> 
> Comments or suggestions are welcome.

And the PoC patch for this is attached. I have changed
"enable_failover_on_backend_shutdown" to
"failover_on_backend_shutdown" for consistency with
"failover_on_backend_error". Documents are not
included. pgpool.conf.sample is for streaming replication only.

Test case 1: failover_on_backend_shutdown = on

test=# pgpool show failover_on_backend_shutdown;
pgpool show failover_on_backend_shutdown;
 failover_on_backend_shutdown 
------------------------------
 on
(1 row)

test=# show pool_nodes;
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.000000  | primary | primary | 0          | false             | 0                 |                   |                        | 2021-08-23 17:04:10
 1       | /tmp     | 11003 | up     | up        | 1.000000  | standby | standby | 0          | true              | 0                 | streaming         | async                  | 2021-08-23 17:04:10
(2 rows)

[kill backend process on node 1]

test=# select 1;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
test=# show pool_nodes;
show pool_nodes;
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.000000  | primary | primary | 0          | true              | 0                 |                   |                        | 2021-08-23 17:04:10
 1       | /tmp     | 11003 | down   | down      | 1.000000  | standby | unknown | 0          | false             | 0                 |                   |                        | 2021-08-23 17:04:56
(2 rows)

Test case 2: failover_on_backend_shutdown = off

test=# pgpool show failover_on_backend_shutdown;
pgpool show failover_on_backend_shutdown;
 failover_on_backend_shutdown 
------------------------------
 off
(1 row)

test=# show pool_nodes;
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.000000  | primary | primary | 0          | false             | 0                 |                   |                        | 2021-08-23 17:04:10
 1       | /tmp     | 11003 | up     | up        | 1.000000  | standby | standby | 0          | true              | 0                 | streaming         | async                  | 2021-08-23 17:04:10
(2 rows)

[kill backend process on node 1]

test=# select 1;
WARNING:  write on backend 1 failed with error :"Broken pipe"
DETAIL:  while trying to write data from offset: 0 wlen: 5
FATAL:  unable to read data from DB node 1
DETAIL:  EOF encountered with backend
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
test=# show pool_nodes;
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.000000  | primary | primary | 0          | false             | 0                 |                   |                        | 2021-08-23 17:09:09
 1       | /tmp     | 11003 | up     | up        | 1.000000  | standby | standby | 0          | true              | 0                 | streaming         | async                  | 2021-08-23 17:09:09
(2 rows)

As you can see, backend node 1 is not down.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: failover_on_backend_shutdown.diff
Type: text/x-patch
Size: 3590 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20210823/814e7f82/attachment.bin>


More information about the pgpool-hackers mailing list