View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000536 | Pgpool-II | Bug | public | 2019-08-09 20:11 | 2020-04-14 15:45 |
| Reporter | gmartinez | Assigned To | hoshiai | ||
| Priority | normal | Severity | minor | Reproducibility | always |
| Status | resolved | Resolution | fixed | ||
| Platform | Linux | OS | CentOS | OS Version | 7 |
| Product Version | 3.7.10 | ||||
| Summary | 0000536: Recovery of standby node in streaming replication mode causes all connections to be dropped | ||||
| Description | When we run pcp_recovery_node on a standby node while in streaming replication, it seems that pgpool drops all connections to the database now, causing errors in the application. We have load balance disabled and only use pgpool for the failover ability. Previously, we were on 3.7.7 and this behavior did not occur. | ||||
| Steps To Reproduce | 1) Stop postgresql on standby node 2) wait for pgpool to notice it's down 3) Start postgresql 4) Run pcp_recovery_node on the downed postgresql 5) Note that all children restarted and the frontend application has dropped connections. | ||||
| Additional Information | Log events: Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 624658: LOG: node recovery, node: 4 restarted Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 624658: LOG: received failback request for node_id: 4 from pid [624658] Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: new IPC connection received Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog received the failover command from local pgpool-II on IPC interface Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog is processing the failover command [FAILBACK_REQUEST] received from local pgpool-II on IPC interface Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: The failover request does not need quorum to hold Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: DETAIL: proceeding with the failover Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: HINT: REQ_DETAIL_CONFIRMED Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: Pgpool-II parent process has received failover request Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: new IPC connection received Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: received the failover indication from Pgpool-II on IPC interface Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog is informed of failover start by the main process Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: starting fail back. reconnect host db005(5432) Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: Node 0 is not down (status: 2) Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: Do not restart children because we are failing back node id 4 host: db005 port: 5432 and we are in streaming replication mode and not all backends were down Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node_repeatedly: waiting for finding a primary node Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 0 Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 1 Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 2 Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: WARNING: failover/failback is in progress Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: DETAIL: executing failover or failback on backend Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: HINT: In a moment you should be able to reconnect to the database Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 3 Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: primary node id is 3 Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: failover: set new primary node: 3 Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: failover: set new master node: 0 Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: new IPC connection received Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: received the failover indication from Pgpool-II on IPC interface Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog is informed of failover end by the main process Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: failback done. reconnect host db005(5432) Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: start health check process for host db005(5432) Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: start health check process for host db005(5432) Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 624658: LOG: recovery done Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 614305: LOG: failback event detected Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 614305: DETAIL: restarting myself Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 598356: LOG: restart request received in pcp child process Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 597088: LOG: PCP child 598356 exits with status 0 in failover() Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 597088: LOG: fork a new PCP child pid 624833 in failover() Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 617874: LOG: failback event detected Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 617874: DETAIL: restarting myself Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 597137: LOG: failback event detected (repeat failback many times) | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
When pgpool execute failover or failoback, alll backend process of pgpool is restarted. But if target is standby and standby is not loadbalance, pgpool is waiting session finished before restarting process. The following log is shown, when the to restart process waited until that session finished: LOG: failback event detected DETAIL: restarting myself I saw this message in your log, so I don't understand the problem that your connection was dropped happend. Could you share all log, when this problem is happened and log_min_messages set to 'DEBUG5' ? |
|
|
I will set the logging to DEBUG5 in our lab and try to reproduce there. |
|
|
Thank you. I wait your trying result. |
|
|
So I was able to reproduce this behavior on my CentOS 6 lab as well and I noticed that shutting down postgresql on a standby node or re-attaching a downed node would also trigger this behavior. I did see errors in the application: Caused by: org.postgresql.util.PSQLException: This connection has been closed. and saw the same behavior I mentioned earlier in the pgpool logs. DEBUG5 logging was extremely verbose (and had query logs!), so I'm not sure how to capture the logs without exposing data while the application is running. |
|
|
Sorry, for the very very late for reply. I investigated your ploblem. In conclusion, this problem(pgpool session is reset during failover/failback) is intended behavior as pgpool community. This behavior has safety for not causing more bigger problem(i.g. segment fault) during failover/failback. This design is mainly for failover, so In the future, I would like to fix that failback(online recovery) will be not caused this behavior. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2019-08-09 20:11 | gmartinez | New Issue | |
| 2019-08-09 20:11 | gmartinez | File Added: pgpool.conf | |
| 2019-08-15 16:32 | t-ishii | Assigned To | => hoshiai |
| 2019-08-15 16:32 | t-ishii | Status | new => assigned |
| 2019-08-15 16:32 | t-ishii | Description Updated | |
| 2019-08-15 16:32 | t-ishii | Steps to Reproduce Updated | |
| 2019-08-15 16:32 | t-ishii | Additional Information Updated | |
| 2019-08-22 14:31 | hoshiai | Note Added: 0002796 | |
| 2019-08-22 14:33 | hoshiai | Note Edited: 0002796 | |
| 2019-08-22 14:33 | hoshiai | Status | assigned => feedback |
| 2019-08-27 07:36 | gmartinez | Note Added: 0002801 | |
| 2019-08-27 07:36 | gmartinez | Status | feedback => assigned |
| 2019-08-27 14:35 | hoshiai | Note Added: 0002803 | |
| 2019-09-21 08:18 | gmartinez | Note Added: 0002873 | |
| 2019-11-27 16:39 | hoshiai | Status | assigned => feedback |
| 2019-11-27 16:39 | hoshiai | Note Added: 0002979 | |
| 2020-04-14 15:45 | hoshiai | Status | feedback => resolved |
| 2020-04-14 15:45 | hoshiai | Resolution | open => fixed |