[pgpool-general: 4336] pgpool issue on failure of primary.

Tue Jan 19 00:57:29 JST 2016

Hi, 

First, hello to all. 

I have a specific issue with pgpool and I hope you guys may point me in 
direction that will allow me to understand and fix it.

We have a postgres system based on streaming replication a primary server and 
two standbys. In front of that are two servers running pgpool-II 3.3.7, doing 
connection pooling and load balancing, with watchdog enabled. 

Pgpool server holds the VIP that clients are connecting to. Three postgres 
servers are using Enterprise Failover Manager (EnterpriseDB) to promote one of 
standby server in case of primary failure (as well as notification about any 
failures). 

Now, the problem seems to be related to a primary failure, we have observed 
the following case  : 

Before failure  : multiple clients are connecting to the active pgpool, 
connections are passed to DB backend fine, databases are seen as active, one is 
seen as primary.

After failure of primary : both remaining database servers are seen as down by 
pgpool, while EFM reports successful failover and connection directly to new 
primary database proves that it is accepting writes. 

Logs are showing some errors, for example : 

Cannot accept() new connection. all backends are down

but I'm not sure I can follow what happens there.  

I have tried attaching backend nodes back using pcp or pgpooladmin, but 
nothing seems to be working.

Now, stopping pgpool on active server does the trick, and new pgpool server 
detects the backend nodes correctly.

In addition, it seems that if a second pgpool server is master at the moment 
of failure of primary db server, we don't observer this issue. 

Any thoughts on how to get to the bottom of this ? 

Regards,
Piotr