[Pgpool-general] Healt Check issues

Tatsuo Ishii ishii at sraoss.co.jp
Tue Jul 21 09:48:37 UTC 2009


> > 2. We set all the health check parameters (timeout, interval and user) and  
> > then we STOPPED (using "kill -STOP") all the postgres processes on the  
> > REMOTE, The PGPOOL (on the LOCAL) does not identify that, and continues  
> > running. By using "strace" on the PGPOOL parent process we could see that  
> > the READ from REMOTE fails with ERESTATSYS but the WRITE is successful and  
> > therefor the system does not cut off the failed REMOTE.
> > When we changed the code in such a way that it will NOT ignore the READ  
> > failure - all children were killed (both LOCAL and REMOTE)
> > Q: Does the failure to identify the connectivity issue evident by the READ  
> > is a PGPOOL BUG or there is a reason for that behavior?
> > 
> > Below is the diff showing the change we introduced to the health check (in  
> > main.c):
> > 
> > [root at lx tmp]# diff pgpool-II-2.2.2.new/main.c pgpool-II-2.2.2/main.c
> > 1438,1445c1438
> > < if (read(fd, &kind, 1) < 0) {
> > < pool_error("health check failed during read. host %s at port %d is down.  
> > reason: %s. Perhaps wrong health check user?",
> > < BACKEND_INFO(i).backend_hostname,
> > < BACKEND_INFO(i).backend_port,
> > < strerror(errno));
> > < close(fd);
> > < return i+1;
> > < }
> > ---
> > > read(fd, &kind, 1);
> 
> I don't recall why health_check() ignores read():-< But it seems you are
> right. Will fix. Thanks.

I have checked in fixes for this. Thanks!
--
Tatsuo Ishii
SRA OSS, Inc. Japan


More information about the Pgpool-general mailing list