[Pgpool-general] TCP connections are *not* closed when a backend timeout

Tatsuo Ishii ishii at sraoss.co.jp
Sun Jul 13 00:41:51 UTC 2008


> Hi,
> 
> I'm working on recovery  with pgpool.
> When a backend failed (ie, for exemple, when the postgresql server shuts
> down), all seems OK, connections are closed and backend is set as down
> (status 3).
> 
> My problem is in case of network problem. If I remove the network link
> with the backend, pgpool correctly detects it as down when healthcheck
> timeout but it does *not* close tcp connections to remote backend.
> 
> The problem is that when the link comes back and when I start
> pcp_recovery_node, the second stage can't process because there are
> existing connections to backend ...
> 
> Is this a normal thing? Or is this a bug ?
> 
> I'm trying to find how I can close the connections when healthcheck
> timeout...

Thanks for the report.

If in case of network problem, the underlying TCP/IP stack is keeping
retrying, and the only safe way to shutdown the connection is
restarting the process. Even if we safely close the limbo connection,
we need to pass the limbo connection info from parent to child process
(remember that health checking is done in parent, and child is keeping
the connection). 

This is not problem if master goes does down, since all children will
restart anyway. I guess you remove the netork cable for nodes other
than master. The difference here is pgpool tries to minimize
restarting children. If master does not fail, pgpool will not do the
restarting.

>From your report I think this logic si wrong and we need to restart
children in *any* case. Included is the patch for this. Could you
please try it out?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
-------------- next part --------------
Index: main.c
===================================================================
RCS file: /cvsroot/pgpool/pgpool-II/main.c,v
retrieving revision 1.37
diff -c -r1.37 main.c
*** main.c	22 May 2008 14:36:38 -0000	1.37
--- main.c	13 Jul 2008 00:29:59 -0000
***************
*** 1224,1229 ****
--- 1224,1230 ----
  	{
  		pool_error("failover_handler: no valid DB node found");
  	}
+ #ifdef NOT_USED
  	else
  	{
  		if (Req_info->master_node_id == new_master && *InRecovery == 0)
***************
*** 1256,1262 ****
  			return;
  		}
  	}
! 
  	/* kill all children */
  	for (i = 0; i < pool_config->num_init_children; i++)
  	{
--- 1257,1263 ----
  			return;
  		}
  	}
! #endif
  	/* kill all children */
  	for (i = 0; i < pool_config->num_init_children; i++)
  	{


More information about the Pgpool-general mailing list