[pgpool-committers: 3757] Re: pgpool: Fix usage of wait(2) in pgpool main process

Tatsuo Ishii ishii at sraoss.co.jp
Thu Jan 5 07:51:52 JST 2017


> On Wed, Jan 4, 2017 at 11:39 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> > Hi ishii-San
>> >
>> > I am looking into the issue
>> > http://www.pgpool.net/mantisbt/view.php?id=249, where
>> > pgpool-II sometimes does not de-escalations while shutting down. And as
>> per
>> > the bug report, the issue starts to appear after this commit.
>> >
>> > Although I am not able to replicate the exact reported issue but It seems
>> > like the changes made by this commit can leave the zombie processes.
>> >
>> > As we are replacing the wait(NULL) with waitpid(,..WNOHANG)
>> >
>> > @@ -1365,8 +1367,10 @@ static RETSIGTYPE exit_handler(int sig)
>> >         POOL_SETMASK(&UnBlockSig);
>> >      do
>> >      {
>> > -        wpid = wait(NULL);
>> > -    }while (wpid > 0 || (wpid == -1 && errno == EINTR));
>> > +               int ret_pid;
>> > +        wpid = waitpid(-1, &ret_pid, WNOHANG);
>> > +    } while (wpid > 0 || (wpid == -1 && errno == EINTR));
>> >
>> > The problem with this logic is that after replacing the wait(NULL) with
>> > waitpid(,..WNOHANG) we can move forward without waiting for all child
>> > process to finish, especially if some child process takes a little longer
>> > to finish. Since waitpid() returns 0 indicating that there is no
>> > exiting process at the moment, even when the child processes exists.
>> > For example,
>> > at the time of system shutdown, the watchdog process sometimes takes few
>> > seconds to execute the de-escalation process before exiting, and
>> meanwhile
>> > in the main process as soon as waitpid( WNOHANG) would return 0 and the
>> > pgpool-II main process exits itself leaving the watchdog process as a
>> > zombie.
>>
>> You are right. I should have not used WNOHANG here. The line should
>> have been:
>>
>>         wpid = waitpid(-1, &ret_pid, 0);
>>
> 
> Thanks for the confirmation. I have committed this change.

Thanks Usama!

> Regards
> Muhammad Usama
> 
> 
>> > Also, is it possible if you can share the scenario where you ran into the
>> > infinite wait situation, as there may be some other issue in the code
>> since
>> > as per the wait() system call documentation it returns -1 when there is
>> no
>> > child process, so theoretically wait() call should not cause the infinite
>> > wait.
>>
>> Not remember clearly but it maybe the case When a child receives a
>> stop signal (SIGSTOP).
>>
>> > On Thu, Jul 7, 2016 at 11:55 AM, Tatsuo Ishii <ishii at postgresql.org>
>> wrote:
>> >
>> >> Fix usage of wait(2) in pgpool main process
>> >>
>> >> Per [pgpool-hackers: 1444]. Here is the copy of the message:
>> >>
>> >> Hi Usama,
>> >>
>> >> I have noticed that the usage of wait(2) in pgpool main could cause
>> >> infinite wait in the system call.
>> >>
>> >>     /* wait for all children to exit */
>> >>     do
>> >>     {
>> >>         wpid = wait(NULL);
>> >>     }while (wpid > 0 || (wpid == -1 && errno == EINTR));
>> >>
>> >> When child process dies, SIGCHLD signal is raised and wait(2) knows
>> >> the event. However, multiple child death does not necessarily creates
>> >> exact same number of SIGCHLD signal as the number of dead children and
>> >> wait(2) could wait for an event which never happens in this case. I
>> >> actually encountered this situation while testing pgpool-II. Solution
>> >> is, to use waitpid(2) instead of wait(2).
>> >>
>> >> Branch
>> >> ------
>> >> master
>> >>
>> >> Details
>> >> -------
>> >> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=
>> >> 0d1cdf96feb77de6f1dfc2d46ecd7467325d1f79
>> >>
>> >> Modified Files
>> >> --------------
>> >> src/main/pgpool_main.c | 12 ++++++++----
>> >> 1 file changed, 8 insertions(+), 4 deletions(-)
>> >>
>> >> _______________________________________________
>> >> pgpool-committers mailing list
>> >> pgpool-committers at pgpool.net
>> >> http://www.pgpool.net/mailman/listinfo/pgpool-committers
>> >>
>>


More information about the pgpool-committers mailing list