View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000407 | Pgpool-II | Bug | public | 2018-06-20 19:46 | 2018-06-25 11:19 |
| Reporter | nagata | Assigned To | |||
| Priority | normal | Severity | minor | Reproducibility | sometimes |
| Status | resolved | Resolution | open | ||
| Product Version | 3.7.4 | ||||
| Target Version | 3.7.5 | ||||
| Summary | 0000407: health check process doesn't start in some cases | ||||
| Description | case 1) When pgpool reads pgpool_status file indicating that a node is down at starting, the health check process corresponding to the node doesn't start. case 2) When watchdog is enabled and the standby pgpool receives backend status information indicating that a node is down from the master pgpool at starting, the health check process corresponding to the node doesn't start. case 3) When a health check process whose corresponding backend is down is killed, this is never restarted. == Currently, the health check process is forked only if the corresponding backend' status is CON_WAIT or CON_UP. Usually (when there is not pgpool_status or -D option is used), the condition is true before starting health check, so all health check processes are started. However, in the cases mentioned above, the condition is false for a certain backend and the corresponding health check process never start. To fix this, we just have to start all health check processes regardless to the backend status in similar to usual cases. In addition, we can also ignore "!switching" condition in repear(), otherwise the health check process killed during failover never restart. I think it is safe because !switching condition is needed only for child processes that may be restarted (or marked to be restarted) in failover(). Another possigle design is to restart the health check process when the corresponding backend is attached or recovered. However, this is more complex than above and I think a simple solution is better. The patch is attached. | ||||
| Steps To Reproduce | You can always reproduce case 1 and case 3. You can reproduce case 2 by starting two pgpool nodes with a interval after shutdown one of the backend nodes, but it might be hard to reproduce this because it depends on timing. | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
Thank you for your patch. I will look into int. |
|
|
> case 1) > When pgpool reads pgpool_status file indicating that a node is down at starting, the health check process corresponding to the node doesn't start. I don't see this as a problem, rather it's an expected behavior. The health check process never checks the status of node which is already in down status, there's no point to start a health check process for the node. > case 3) > When a health check process whose corresponding backend is down is killed, this is never restarted. Ditto as 0000001. I guess the proper fix would be starting the health check process when the backend node is re-attached. |
|
|
|
|
|
Attached is the patch to start the health check process when the backend node is re-attached. |
|
|
ok. I think this is a simple fix that will resolve the all problems I mentioned because the health check process is restarted after pcp_attach_node. By the way, do we need kill the health check process when a backend node is degenerated? As you said, it is no point to run a health check process for a node in down, although it is harmless even if we don't care of it. |
|
|
I think we can safely leave it when a node is degenerated. Of course it's a little bit waste of CPU cycle, but I think an effort to try to eliminate it is not worth the trouble. |
|
|
OK. I agreed. Thanks. |
|
|
Fix pushed. BTW, Pgpool-II 3.7.5 does not exist yet. |
|
|
> BTW, Pgpool-II 3.7.5 does not exist yet. Description fixed. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2018-06-20 19:46 | nagata | New Issue | |
| 2018-06-20 19:46 | nagata | File Added: fix_do_health_check_child.patch | |
| 2018-06-21 09:22 | pengbo | Note Added: 0002053 | |
| 2018-06-21 12:02 | t-ishii | Note Added: 0002054 | |
| 2018-06-21 12:02 | t-ishii | Note Edited: 0002054 | |
| 2018-06-21 12:05 | t-ishii | Note Edited: 0002054 | |
| 2018-06-21 13:03 | t-ishii | File Added: start_health_check_on_failback.diff | |
| 2018-06-21 13:03 | t-ishii | Note Added: 0002056 | |
| 2018-06-21 14:18 | nagata | Note Added: 0002060 | |
| 2018-06-21 14:23 | t-ishii | Note Added: 0002061 | |
| 2018-06-21 14:38 | nagata | Note Added: 0002062 | |
| 2018-06-22 17:32 | t-ishii | Note Added: 0002070 | |
| 2018-06-25 11:18 | t-ishii | Product Version | 3.7.5 => 3.7.4 |
| 2018-06-25 11:18 | t-ishii | Target Version | => 3.7.5 |
| 2018-06-25 11:18 | t-ishii | Note Added: 0002072 | |
| 2018-06-25 11:19 | t-ishii | Status | new => resolved |