0000424: pcp_recovery_nod in follow_master_command possibly fails with an error - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000424	Pgpool-II	Bug	public	2018-08-16 16:40	2019-05-21 10:15

Reporter	nagata	Assigned To	t-ishii
Priority	normal	Severity	minor	Reproducibility	have not tried
Status	closed	Resolution	open

Summary	0000424: pcp_recovery_nod in follow_master_command possibly fails with an error
Description	One of our clients is getting this error each time follow_master calls pcp_recovery_node on a downed standby: ERROR: failed to process PCP request at the moment DETAIL: failover is in progress In the current implementation, pcp_recovery_node can not be run during failover/failback. (This is checked in pcp_process_command()). So, it seams that pcp_recovery_node is being called before the failover_command completes. The ideas to run pcp_recovery_node safely in follow_master are as below. 1. Insert a sleep before running pcp_recovery_node. 2. Retry pcp_recovery_node if this fails due to the error. However, can this be handled in Pgpool-II itself? If running pcp_recovery_node in follow_master_command is a expected use case, I think Pgpool-II should provide the safe way to do this. Any idea to resolve this?
Tags	No tags attached.

t-ishii 2018-08-16 17:03 developer ~0002161	Probably you are misunderstanding the usage of follow master command. The command should be run after failover done. Users can issue SQL to Pgpool-II while follow master commands are running. See follow_master_command.sh generated by pgpool_setup.

nagata 2018-08-16 17:34 developer ~0002162	On my understanding, the follow_master_command are triggered in failover() and this is before Req_info->switching is cleared (= set to false).

t-ishii 2018-08-16 17:39 developer ~0002163	Don't you miss the next line? if(Req_info->request_queue_tail != Req_info->request_queue_head)

nagata 2018-08-16 17:43 developer ~0002164	Oops, sorry, I missed this. I'll looking into this, again.

nagata 2018-08-16 18:20 developer ~0002165	OK - Req_info->request_queue_tail is incremented when failover or failover request is registered by register_node_operation_request(). - Req_info->request_queue_head is incremented at the top of the loop in failover(), that is, at the start point of processing each failback/failover request. So, (Req_info->request_queue_tail != Req_info->request_queue_head) is true when multiple failover or failback were registered but some part of them is not processed yet. This may happen if failover is requested twice or more quickly, or the "failback" request is registered at pcp_recovery_node in the first follow_master_command, for example. I'll report more when we get a log messages from the client or when I can reproduce this in my machine.

t-ishii 2019-02-24 20:21 developer ~0002402	Can we close this issue?

t-ishii 2019-05-21 10:15 developer ~0002609	No response from the reporter over 1 month. I am going to close this issue.

Date Modified	Username	Field	Change
2018-08-16 16:40	nagata	New Issue
2018-08-16 17:03	t-ishii	Note Added: 0002161
2018-08-16 17:34	nagata	Note Added: 0002162
2018-08-16 17:39	t-ishii	Note Added: 0002163
2018-08-16 17:43	nagata	Note Added: 0002164
2018-08-16 18:20	nagata	Note Added: 0002165
2019-01-30 10:08	administrator	Assigned To	=> t-ishii
2019-01-30 10:08	administrator	Status	new => assigned
2019-02-04 08:59	t-ishii	Status	assigned => feedback
2019-02-24 20:21	t-ishii	Note Added: 0002402
2019-05-21 10:15	t-ishii	Note Added: 0002609
2019-05-21 10:15	t-ishii	Status	feedback => closed