3.3.3 , Apparent hang

Quentin Hartman qhartman at direwolfdigital.com
Sat Sep 13 00:39:28 JST 2014

I have an application that has a pair of replicating Postgres 9.2 servers,
that are accessed via a couple pgpool 3.3.3 boxes that behind an HA proxy
box. The setup generally works well, but after some amount of time under
high load, the pgpool boxes will sometimes stop responding. They seem
healthy at a system level, but if you try to connect to a box in this
state, it will open a connection but never present the expected login
dialog. I've connected strace to an instance in this state, and it doesn't
seem to be doing anything at all. It seems to be watching a couple of
particular inodes for activity, but that's it. A functional but idle
instance shows many more system calls happening. Connecting to child pids
shows even less activity. They do not respond to HUPs or TERMs as far as I
can tell, and the normal "pgpool stop" also does not work. The only way
I've found to stop the processes once in this state is the good ol' kill -9.

The only interesting log traffic I've seen is this:

"ProcessFrontendResponse: failed to read kind from frontend. frontend
abnormally exited"

Is this a known issue? My searches have not turned up anything. If not,
what is the best information I can provide to help track it down? Note that
we've also seen the DISCARD problem that is apparently fixed in 3.3.4, but
this seems to be a different problem. One of the apparent symptoms of this
problem is a large number sockets stuck in SYN_RECV state, but I'm not
certain it's related.


