[pgpool-hackers: 4397] Another timeout error 001

Tatsuo Ishii ishii at sraoss.co.jp
Mon Sep 18 13:21:05 JST 2023


I have looked into this:

From: buildfarm at pgpool.net
Subject: [pgpool-buildfarm: 2938] Pgpool-II buildfarm results CentOS7
Date: Mon, 18 Sep 2023 06:22:33 +0900
Message-ID: <65076e19.0KJyP23oiPU8iNp9%buildfarm at pgpool.net>

> * master  PostgreSQL 11  CentOS7
> testing 001.load_balance...timeout.

From: src/test/regression/log/001.load_balance:

2023-09-16 20:07:42.993: main pid 9301: LOG:  stop request sent to pgpool (pid: 9027). waiting for termination...
.done.
tcp        0      0 0.0.0.0:11000           0.0.0.0:*               LISTEN      9302/pgpool:        
tcp6       0      0 :::11000                :::*                    LISTEN      9302/pgpool:        
tcp        0      0 0.0.0.0:11000           0.0.0.0:*               LISTEN      9302/pgpool:        
tcp6       0      0 :::11000                :::*                    LISTEN      9302/pgpool:
[repeating until timeout]

The process 9302 was pcp main process.(from
src/test/regression/tests/001.load_balance/testdir/log/pgpool.log)
[snip]
2023-09-16 20:07:43.243: pcp_main pid 9184: LOG:  restart request received in pcp child process
2023-09-16 20:07:43.243: pcp_main pid 9184: DEBUG:  shmem_exit(-1): 2 callbacks to make
2023-09-16 20:07:43.244: pcp_main pid 9184: DEBUG:  proc_exit(-1): 0 callbacks to make
2023-09-16 20:07:43.246: main pid 9027: LOG:  PCP child 9184 exits with status 0 in failover()
2023-09-16 20:07:43.246: main pid 9027: LOG:  fork a new PCP child pid 9302 in failover()
2023-09-16 20:07:43.246: main pid 9027: LOG:  SIGINT is member
2023-09-16 20:07:43.247: main pid 9027: LOG:  exit handler called (signal: 2)
2023-09-16 20:07:43.247: main pid 9027: LOG:  shutting down by signal 2
2023-09-16 20:07:43.247: main pid 9027: LOG:  terminating all child processes
[snip]

Here port 11000 was the main pgpool port listened by pgpool main. The
reason why pcp process listening pgpool port is, the pcp process was
forked off by the pgpool process. So this is ok.  Question is, why the
process 9302 did not exit? Here is my guess:

When pcp process was forked off, it unblocks all signals.

2023-09-16 20:07:43.246: main pid 9027: LOG:  fork a new PCP child pid 9302 in failover()
-----------------------------------------------------------------------------------
static pid_t
pcp_fork_a_child(int *fds, char *pcp_conf_file)
{
	pid_t		pid;

	pid = fork();

	if (pid == 0)
	{
		on_exit_reset();
		SetProcessGlobalVariables(PT_PCP);

		close(pipe_fds[0]);
		close(pipe_fds[1]);

		/* call PCP child main */
		POOL_SETMASK(&UnBlockSig);
:
:
-----------------------------------------------------------------------------------

Then signal 2 (SIGINT) was sent to pgpool main and exit_handler was called.

2023-09-16 20:07:43.247: main pid 9027: LOG:  exit handler called (signal: 2)

In exit_handler, signal 2 is sent to the pcp process.

The pcp process accepted the signal because the signal was not
blocked, probably before it's signal handler for signal 2 was
establised. As a result, pgpool main's exit_handler was called. But it
refuses to process the signal and proc_exit(0) was called.

-----------------------------------------------------------------------------------
	/*
	 * this could happen in a child process if a signal has been sent before
	 * resetting signal handler
	 */
	if (getpid() != mypid)
	{
		POOL_SETMASK(&UnBlockSig);
		proc_exit(0);
	}
-----------------------------------------------------------------------------------

But it seems the pcp process 9302 did not exit despite proc_exit was
called. Maybe some functions registered for proc_exit was blocking?

Anyway, I think signal mask for pcp process should be unmasked until
signal handlers are established.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list