View Issue Details

IDProjectCategoryView StatusLast Update
0000536Pgpool-IIBugpublic2020-04-14 15:45
ReportergmartinezAssigned Tohoshiai 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
PlatformLinuxOSCentOSOS Version7
Product Version3.7.10 
Target VersionFixed in Version 
Summary0000536: Recovery of standby node in streaming replication mode causes all connections to be dropped
DescriptionWhen we run pcp_recovery_node on a standby node while in streaming replication, it seems that pgpool drops all connections to the database now, causing errors in the application. We have load balance disabled and only use pgpool for the failover ability.

Previously, we were on 3.7.7 and this behavior did not occur.
Steps To Reproduce1) Stop postgresql on standby node
2) wait for pgpool to notice it's down
3) Start postgresql
4) Run pcp_recovery_node on the downed postgresql
5) Note that all children restarted and the frontend application has dropped connections.
Additional InformationLog events:

Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 624658: LOG: node recovery, node: 4 restarted
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 624658: LOG: received failback request for node_id: 4 from pid [624658]
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: new IPC connection received
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog received the failover command from local pgpool-II on IPC interface
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog is processing the failover command [FAILBACK_REQUEST] received from local pgpool-II on IPC interface
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: The failover request does not need quorum to hold
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: DETAIL: proceeding with the failover
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: HINT: REQ_DETAIL_CONFIRMED
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: Pgpool-II parent process has received failover request
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: new IPC connection received
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598089: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: received the failover indication from Pgpool-II on IPC interface
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597318: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog is informed of failover start by the main process
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597971: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597993: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: starting fail back. reconnect host db005(5432)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: Node 0 is not down (status: 2)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: Do not restart children because we are failing back node id 4 host: db005 port: 5432 and we are in streaming replication mode and not all backends were down
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node_repeatedly: waiting for finding a primary node
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 0
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 1
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 2
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: WARNING: failover/failback is in progress
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: DETAIL: executing failover or failback on backend
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 598130: HINT: In a moment you should be able to reconnect to the database
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: checking backend no 3
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: find_primary_node: primary node id is 3
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: failover: set new primary node: 3
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: failover: set new master node: 0
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: new IPC connection received
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: received the failover indication from Pgpool-II on IPC interface
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597090: LOG: watchdog is informed of failover end by the main process
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: failback done. reconnect host db005(5432)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: start health check process for host db005(5432)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 597088: LOG: start health check process for host db005(5432)
Aug 09 10:43:54 db006 cgexec[597088]: 2019-08-09 10:43:54: pid 624658: LOG: recovery done
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 614305: LOG: failback event detected
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 614305: DETAIL: restarting myself
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 598356: LOG: restart request received in pcp child process
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 597088: LOG: PCP child 598356 exits with status 0 in failover()
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 597088: LOG: fork a new PCP child pid 624833 in failover()
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 617874: LOG: failback event detected
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 617874: DETAIL: restarting myself
Aug 09 10:43:55 db006 cgexec[597088]: 2019-08-09 10:43:55: pid 597137: LOG: failback event detected
(repeat failback many times)
TagsNo tags attached.

Activities

gmartinez

2019-08-09 20:11

reporter  

pgpool.conf (8,041 bytes)

hoshiai

2019-08-22 14:31

developer   ~0002796

Last edited: 2019-08-22 14:33

View 2 revisions

When pgpool execute failover or failoback, alll backend process of pgpool is restarted.
But if target is standby and standby is not loadbalance, pgpool is waiting session finished before restarting process.

The following log is shown, when the to restart process waited until that session finished:
  LOG: failback event detected
  DETAIL: restarting myself

I saw this message in your log, so I don't understand the problem that your connection was dropped happend.
Could you share all log, when this problem is happened and log_min_messages set to 'DEBUG5' ?

gmartinez

2019-08-27 07:36

reporter   ~0002801

I will set the logging to DEBUG5 in our lab and try to reproduce there.

hoshiai

2019-08-27 14:35

developer   ~0002803

Thank you. I wait your trying result.

gmartinez

2019-09-21 08:18

reporter   ~0002873

So I was able to reproduce this behavior on my CentOS 6 lab as well and I noticed that shutting down postgresql on a standby node or re-attaching a downed node would also trigger this behavior. I did see errors in the application: Caused by: org.postgresql.util.PSQLException: This connection has been closed.
and saw the same behavior I mentioned earlier in the pgpool logs.


DEBUG5 logging was extremely verbose (and had query logs!), so I'm not sure how to capture the logs without exposing data while the application is running.

hoshiai

2019-11-27 16:39

developer   ~0002979

Sorry, for the very very late for reply.

I investigated your ploblem.
In conclusion, this problem(pgpool session is reset during failover/failback) is intended behavior as pgpool community. This behavior has safety for not causing more bigger problem(i.g. segment fault) during failover/failback.

This design is mainly for failover, so
In the future, I would like to fix that failback(online recovery) will be not caused this behavior.

Issue History

Date Modified Username Field Change
2019-08-09 20:11 gmartinez New Issue
2019-08-09 20:11 gmartinez File Added: pgpool.conf
2019-08-15 16:32 t-ishii Assigned To => hoshiai
2019-08-15 16:32 t-ishii Status new => assigned
2019-08-15 16:32 t-ishii Description Updated View Revisions
2019-08-15 16:32 t-ishii Steps to Reproduce Updated View Revisions
2019-08-15 16:32 t-ishii Additional Information Updated View Revisions
2019-08-22 14:31 hoshiai Note Added: 0002796
2019-08-22 14:33 hoshiai Note Edited: 0002796 View Revisions
2019-08-22 14:33 hoshiai Status assigned => feedback
2019-08-27 07:36 gmartinez Note Added: 0002801
2019-08-27 07:36 gmartinez Status feedback => assigned
2019-08-27 14:35 hoshiai Note Added: 0002803
2019-09-21 08:18 gmartinez Note Added: 0002873
2019-11-27 16:39 hoshiai Status assigned => feedback
2019-11-27 16:39 hoshiai Note Added: 0002979
2020-04-14 15:45 hoshiai Status feedback => resolved
2020-04-14 15:45 hoshiai Resolution open => fixed