View Issue Details

IDProjectCategoryView StatusLast Update
0000431Pgpool-IIBugpublic2018-10-16 15:54
ReporternagataAssigned Tonagata 
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product Version3.6.12 
Target VersionFixed in Version 
Summary0000431: In native replication mode, online-recovery is blocked after a child process exits abnormally while accepting a connection.
DescriptionWhen native replication mode is used, 2nd stage script is executed in online-recovery after all the connection are closed. The counter of connections is incremented when a child process accepts a connection, and decremented when the session is closed. However, if a child process exits abnormally, for example, due to the segfault or kill -9, the counter is never decremented, and this blocks the 2nd stage script forever.

Steps To Reproduce1. Configure native replication cluster by pgpool_setup
2. Stop a backend node
3. Connect to Pgpool-Ii using psql
4. Kill the child process which is connected from psql
5. Run online recovery using pcp_recovery_node
-> 2nd stage script is blocked forever
TagsNo tags attached.

Activities

t-ishii

2018-10-10 11:20

developer   ~0002189

You should enable client_idle_limit_in_recovery.

nagata

2018-10-10 11:47

developer   ~0002190

Last edited: 2018-10-10 11:48

View 2 revisions

Thank you for your quick response.

Yes, enable client_idle_limit_in_recovery prevents online-recovery from being blocked forever.
However, online-recovery itself fails with the following error.

 LOG: wait_connection_closed: existing connections did not close in 90 sec.
 ERROR: node recovery failed, waiting connection closed in the other pgpools timeout

I think, the only way to enable online-recovery in this situation is to restart Pgpool-II to reset the Req_info->conn_counter to zero, right?

t-ishii

2018-10-10 11:56

developer   ~0002191

In any case, if pgpool child process gets killed abnormaly, there's not too much Pgpool-II can do. I recommend to restart Pgpool-II.

nagata

2018-10-10 12:14

developer   ~0002192

Last edited: 2018-10-10 12:15

View 2 revisions

ok. I take it that it is difficult to resolve this by fixing Pgpool-II.

Our clients is suffered from a segfault of child processes and this made online-recovery problem come to the surface.
So, the root problem is the segfalut, and I am continuing to investigate this now. I'll report if i find something new.

nagata

2018-10-16 15:53

developer   ~0002203

The segmentation I mentioned in this thread is reported in below,

http://www.pgpool.net/mantisbt/view.php?id=434

so I 'll close this thread. Thanks.

Issue History

Date Modified Username Field Change
2018-10-10 10:59 nagata New Issue
2018-10-10 11:00 nagata Description Updated View Revisions
2018-10-10 11:20 t-ishii Note Added: 0002189
2018-10-10 11:47 nagata Note Added: 0002190
2018-10-10 11:48 nagata Note Edited: 0002190 View Revisions
2018-10-10 11:56 t-ishii Note Added: 0002191
2018-10-10 12:14 nagata Note Added: 0002192
2018-10-10 12:15 nagata Note Edited: 0002192 View Revisions
2018-10-16 15:53 nagata Note Added: 0002203
2018-10-16 15:54 nagata Assigned To => nagata
2018-10-16 15:54 nagata Status new => closed
2018-10-16 15:54 nagata Resolution open => fixed