[pgpool-hackers: 3895] Re: Patch: Move auto_failback_interval in to BackendInfo, and update it any time the backend state is set to CON_DOWN

Wed May 5 16:03:28 JST 2021

>> On 27/04/2021, at 10:18 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> 
>> Hi Nathan,
>> 
>>> Hi,
>>> 
>>> Sorry about that! I dragged them from the vscode file list directly to Mail - I suspect that that doesn’t work when using remote editing..!
>>> 
>>> I have attached the files now - does that work?
>> 
>> Yes! I will look into the patches. Hoshiai-san, can you please look
>> into the patches as well because you are the original author of the
>> feature.
> 
> Hi!
> 
> I was wondering if you had time to look at these patches yet? :-)
> 
> No rush - just making sure it doesn’t get missed!

I just have started to look into your patch. Also I was able to
reproduce the problem.

1) create 3-node streaming replication cluster.

pgpool_setup -n 3

Enable auto_failback and set health_check_period to 1 so that
auto_failback runs more aggressively.

auto_failback = on
health_check_period0 = 1
health_check_period1 = 1
health_check_period2 = 1

start the whole system.

2) detach node 0 (which is primary)

3) node 3 becomes down and PostgreSQL won't start

psql -p 11000 -c "show pool_nodes" test
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.333333  | standby | standby | 0          | true              | 0                 | streaming         | async                  | 2021-05-05 14:10:38
 1       | /tmp     | 11003 | up     | up        | 0.333333  | primary | primary | 0          | false             | 0                 |                   |                        | 2021-05-05 14:10:25
 2       | /tmp     | 11004 | down   | down      | 0.333333  | standby | unknown | 0          | false             | 0                 |                   |                        | 2021-05-05 14:10:38
(3 rows)

The cause of the problem is a race condition between the auto failback
and follow primary as you and Hoshiai-san suggested. Here are some
extraction from the pgpool.log.

$ egrep "degeneration|failback" log/pgpool.log|grep -v child
2021-05-05 14:10:22: main pid 28630: LOG:  starting degeneration. shutdown host /tmp(11002)
2021-05-05 14:10:25: main pid 28630: LOG:  starting follow degeneration. shutdown host /tmp(11002)
2021-05-05 14:10:25: main pid 28630: LOG:  starting follow degeneration. shutdown host /tmp(11004)	-- #1
2021-05-05 14:10:25: health_check2 pid 28673: LOG:  request auto failback, node id:2	-- #2
2021-05-05 14:10:25: health_check2 pid 28673: LOG:  received failback request for node_id: 2 from pid [28673]
2021-05-05 14:10:35: main pid 28630: LOG:  failback done. reconnect host /tmp(11004)
2021-05-05 14:10:35: main pid 28630: LOG:  failback done. reconnect host /tmp(11002)	-- #3
2021-05-05 14:10:36: pcp_child pid 29035: LOG:  starting recovering node 2
2021-05-05 14:10:36: pcp_child pid 29035: ERROR:  node recovery failed, node id: 2 is alive	-- #4
2021-05-05 14:10:38: child pid 29070: LOG:  failed to connect to PostgreSQL server by unix domain socket
2021-05-05 14:10:38: child pid 29070: DETAIL:  executing failover on backend
2021-05-05 14:10:38: main pid 28630: LOG:  Pgpool-II parent process has received failover request
2021-05-05 14:10:38: main pid 28630: LOG:  starting degeneration. shutdown host /tmp(11004)	-- #5

1) Follow primary started to shutdown node 2. At this point the
   backend node 2 was still running.

2) auto failback found that backend is still alive and send failback
   request for node 2.

3) pgpool main process reported that node 2 was back. But actual
   failback had not done and continued by follow primary command.

4) follow primary command for node 2 failed because auto failback set
   the status of node 2 to "up".

5) Node 2 PostgreSQL was down and health check detected it. Node 2
   status became down.

So if auto failback did not run at #2, the follow primary should have
been succeeded.

BTW accidently I and a user found similar situation: conflicting
concurrent run of detach_false_primary and follow primary command:

https://www.pgpool.net/pipermail/pgpool-general/2021-April/007583.html

In the discussion I proposed a patch to prevent the concurrent run of
detach_false_primary and follow primary command. I think we can apply
the method to auto_failback as well. Attached is the patch to
implement it on top of the patch I posted here for the master branch:

https://www.pgpool.net/pipermail/pgpool-general/2021-April/007594.html

This patch actually has a small window between here:

		if (check_failback && !Req_info->switching && slot &&
			Req_info->follow_primary_count == 0)
and here:
				ereport(LOG,
					(errmsg("request auto failback, node id:%d", node)));
				/* get current time to use auto_faliback_interval */
				now = time(NULL);
				auto_failback_interval = now + pool_config->auto_failback_interval;

				send_failback_request(node, true, REQ_DETAIL_CONFIRMED);

because after checking Req_info->follow_primary_count, follow primary
might start just after this. I think the window and probably is
harmless in the wild. If you think it's not so small, we could do an
exclusive lock like in detach_false_primary to plug the window.

Also we have found that detach_false_primary should only run on the
leader watchdog node. Probably we should consider this for
auto_failback too.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: auto_failback.diff
Type: text/x-patch
Size: 493 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20210505/edfa1945/attachment.bin>