0000407: health check process doesn't start in some cases

ID	Project	Category	View Status	Date Submitted	Last Update

0000407	Pgpool-II	Bug	public	2018-06-20 19:46	2018-06-25 11:19

Reporter	nagata	Assigned To
Priority	normal	Severity	minor	Reproducibility	sometimes
Status	resolved	Resolution	open
Product Version	3.7.4
Target Version	3.7.5

Summary	0000407: health check process doesn't start in some cases
Description	case 1) When pgpool reads pgpool_status file indicating that a node is down at starting, the health check process corresponding to the node doesn't start. case 2) When watchdog is enabled and the standby pgpool receives backend status information indicating that a node is down from the master pgpool at starting, the health check process corresponding to the node doesn't start. case 3) When a health check process whose corresponding backend is down is killed, this is never restarted. == Currently, the health check process is forked only if the corresponding backend' status is CON_WAIT or CON_UP. Usually (when there is not pgpool_status or -D option is used), the condition is true before starting health check, so all health check processes are started. However, in the cases mentioned above, the condition is false for a certain backend and the corresponding health check process never start. To fix this, we just have to start all health check processes regardless to the backend status in similar to usual cases. In addition, we can also ignore "!switching" condition in repear(), otherwise the health check process killed during failover never restart. I think it is safe because !switching condition is needed only for child processes that may be restarted (or marked to be restarted) in failover(). Another possigle design is to restart the health check process when the corresponding backend is attached or recovered. However, this is more complex than above and I think a simple solution is better. The patch is attached.
Steps To Reproduce	You can always reproduce case 1 and case 3. You can reproduce case 2 by starting two pgpool nodes with a interval after shutdown one of the backend nodes, but it might be hard to reproduce this because it depends on timing.
Tags	No tags attached.

nagata 2018-06-20 19:46 developer	fix_do_health_check_child.patch (914 bytes) fix_do_health_check_child.patch (914 bytes)

pengbo 2018-06-21 09:22 developer ~0002053	Thank you for your patch. I will look into int.

t-ishii 2018-06-21 12:02 developer ~0002054 Last edited: 2018-06-21 12:05	> case 1) > When pgpool reads pgpool_status file indicating that a node is down at starting, the health check process corresponding to the node doesn't start. I don't see this as a problem, rather it's an expected behavior. The health check process never checks the status of node which is already in down status, there's no point to start a health check process for the node. > case 3) > When a health check process whose corresponding backend is down is killed, this is never restarted. Ditto as 0000001. I guess the proper fix would be starting the health check process when the backend node is re-attached.

t-ishii 2018-06-21 13:03 developer	start_health_check_on_failback.diff (766 bytes) start_health_check_on_failback.diff (766 bytes)

t-ishii 2018-06-21 13:03 developer ~0002056	Attached is the patch to start the health check process when the backend node is re-attached.

nagata 2018-06-21 14:18 developer ~0002060	ok. I think this is a simple fix that will resolve the all problems I mentioned because the health check process is restarted after pcp_attach_node. By the way, do we need kill the health check process when a backend node is degenerated? As you said, it is no point to run a health check process for a node in down, although it is harmless even if we don't care of it.

t-ishii 2018-06-21 14:23 developer ~0002061	I think we can safely leave it when a node is degenerated. Of course it's a little bit waste of CPU cycle, but I think an effort to try to eliminate it is not worth the trouble.

nagata 2018-06-21 14:38 developer ~0002062	OK. I agreed. Thanks.

t-ishii 2018-06-22 17:32 developer ~0002070	Fix pushed. BTW, Pgpool-II 3.7.5 does not exist yet.

t-ishii 2018-06-25 11:18 developer ~0002072	> BTW, Pgpool-II 3.7.5 does not exist yet. Description fixed.

Date Modified	Username	Field	Change
2018-06-20 19:46	nagata	New Issue
2018-06-20 19:46	nagata	File Added: fix_do_health_check_child.patch
2018-06-21 09:22	pengbo	Note Added: 0002053
2018-06-21 12:02	t-ishii	Note Added: 0002054
2018-06-21 12:02	t-ishii	Note Edited: 0002054
2018-06-21 12:05	t-ishii	Note Edited: 0002054
2018-06-21 13:03	t-ishii	File Added: start_health_check_on_failback.diff
2018-06-21 13:03	t-ishii	Note Added: 0002056
2018-06-21 14:18	nagata	Note Added: 0002060
2018-06-21 14:23	t-ishii	Note Added: 0002061
2018-06-21 14:38	nagata	Note Added: 0002062
2018-06-22 17:32	t-ishii	Note Added: 0002070
2018-06-25 11:18	t-ishii	Product Version	3.7.5 => 3.7.4
2018-06-25 11:18	t-ishii	Target Version	=> 3.7.5
2018-06-25 11:18	t-ishii	Note Added: 0002072
2018-06-25 11:19	t-ishii	Status	new => resolved

View Issue Details

Activities

Issue History