[pgpool-hackers: 3213] Re: Deal with recovery failure by an abnormally exiting child process
ishii at sraoss.co.jp
Tue Jan 8 11:16:19 JST 2019
>> In bug 431, it was reported that recovery second stage fails if there
>> was an abnormally exiting child process (typically caused by SIGKILL
>> or segfault). This is because the global connection counter
>> (Req_info->conn_counter) is left when the child process abnormaly
>> exits. In general we have nothing to do for abnormaly exiting process
>> situation and we recommend to restart whole Pgpool-II in this case.
>> However I find a tricky solution for a particular situation: if
>> client_idle_limit_in_recovery is properly set (i.e.
>> client_idle_limit_in_recovery >= recovery_timeout).
Sorry this should have been: 0< client_idle_limit_in_recovery <= recovery_timeout || client_idle_limit_in_recovery == -1
>> The logic is shown in the patch:
>> * recovery_timeout was expired. Before returning with failure status,
>> * let's check if this is caused by the malformed conn_counter. If a child
>> * process abnormally exits (killed by SIGKILL or SEGFAULT, for example),
>> * then conn_counter is not decremented at process exit, thus it will
>> * never be returning to 0. This could be detected by checking if
>> * client_idle_limit_in_recovery is enabled and less value than
>> * recovery_timeout because all clients must be kicked out by the time
>> * when client_idle_limit_in_recovery is expired. If so, we should reset
>> * conn_counter to 0 also.
>> Should we emply this? Is it too tricky? Comments are welcome.
> Forgot to attach patch.
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
More information about the pgpool-hackers