[pgpool-committers: 3736] Re: pgpool: Fix usage of wait(2) in pgpool main process

Muhammad Usama m.usama at gmail.com
Wed Jan 4 04:04:58 JST 2017


Hi ishii-San

I am looking into the issue
http://www.pgpool.net/mantisbt/view.php?id=249, where
pgpool-II sometimes does not de-escalations while shutting down. And as per
the bug report, the issue starts to appear after this commit.

Although I am not able to replicate the exact reported issue but It seems
like the changes made by this commit can leave the zombie processes.

As we are replacing the wait(NULL) with waitpid(,..WNOHANG)

@@ -1365,8 +1367,10 @@ static RETSIGTYPE exit_handler(int sig)
        POOL_SETMASK(&UnBlockSig);
     do
     {
-        wpid = wait(NULL);
-    }while (wpid > 0 || (wpid == -1 && errno == EINTR));
+               int ret_pid;
+        wpid = waitpid(-1, &ret_pid, WNOHANG);
+    } while (wpid > 0 || (wpid == -1 && errno == EINTR));

The problem with this logic is that after replacing the wait(NULL) with
waitpid(,..WNOHANG) we can move forward without waiting for all child
process to finish, especially if some child process takes a little longer
to finish. Since waitpid() returns 0 indicating that there is no
exiting process at the moment, even when the child processes exists.
For example,
at the time of system shutdown, the watchdog process sometimes takes few
seconds to execute the de-escalation process before exiting, and meanwhile
in the main process as soon as waitpid( WNOHANG) would return 0 and the
pgpool-II main process exits itself leaving the watchdog process as a
zombie.

Also, is it possible if you can share the scenario where you ran into the
infinite wait situation, as there may be some other issue in the code since
as per the wait() system call documentation it returns -1 when there is no
child process, so theoretically wait() call should not cause the infinite
wait.


Thanks
Best regards
Muhammad Usama



On Thu, Jul 7, 2016 at 11:55 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> Fix usage of wait(2) in pgpool main process
>
> Per [pgpool-hackers: 1444]. Here is the copy of the message:
>
> Hi Usama,
>
> I have noticed that the usage of wait(2) in pgpool main could cause
> infinite wait in the system call.
>
>     /* wait for all children to exit */
>     do
>     {
>         wpid = wait(NULL);
>     }while (wpid > 0 || (wpid == -1 && errno == EINTR));
>
> When child process dies, SIGCHLD signal is raised and wait(2) knows
> the event. However, multiple child death does not necessarily creates
> exact same number of SIGCHLD signal as the number of dead children and
> wait(2) could wait for an event which never happens in this case. I
> actually encountered this situation while testing pgpool-II. Solution
> is, to use waitpid(2) instead of wait(2).
>
> Branch
> ------
> master
>
> Details
> -------
> http://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=
> 0d1cdf96feb77de6f1dfc2d46ecd7467325d1f79
>
> Modified Files
> --------------
> src/main/pgpool_main.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> _______________________________________________
> pgpool-committers mailing list
> pgpool-committers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-committers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-committers/attachments/20170104/cc4a16a0/attachment-0001.html>


More information about the pgpool-committers mailing list