[pgpool-general: 1564] pgpool-II-3.2.3 blocking/segfault on failover

David Christensen machack666 at gmail.com
Tue Apr 2 05:18:33 JST 2013

Hi folks,

I'm running into an issue when using pgpool-II-3.2.3 in a multi-node
cluster, compiled from source on a RedHat 6.2 cluster.  (configure line was
just `./configure --with-pgsql=/usr/pgsql-9.1`.)

Everything works fine when pgpool is initially connected.  There are 4
backends in the cluster, one master and 3 standbys setup as Streaming
Replication slaves.  When you connect to pgpool, everything works as
expected, the problem ends up being when the master (backend0) fails over
and one of the other slaves takes over.  At this point, attempting to
connect to the pgpool port hangs for roughly 1 minute before finally
connecting.  I have also noticed occasional segfaults showing up in syslog
when this event occurs.  (On reviewing the logs associated to this, I'm
kind of thinking the segfaults might be a red herring; it looks like this
was coming from the rpm-installed version of pgpool instead of the source
build, but worth mentioning.)

This behavior appears also in a 2-node cluster with 1 master and 1 standby.

`strace` showed psql hanging in a poll() status when trying to connect
through the pgpool socket; it hangs on the last line here:

   connect(3, {sa_family=AF_FILE, path="/tmp/.s.PGSQL.5432"}, 110) = 0
   getsockopt(3, SOL_SOCKET, SO_ERROR, [688889114778402816], [4]) = 0
   getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
   poll([{fd=3, events=POLLOUT|POLLERR}], 1, -1) = 1 ([{fd=3,
   sendto(3, "\0\0\0T\0\3\0\0user\0postgres\0database\0p"..., 84,
   poll([{fd=3, events=POLLIN|POLLERR}], 1, -1

Some additional information which may be relevant/useful: there are many
lines in the syslog log showing what looks like socket/filehandle
exhaustion, e.g.:

   Mar 28 23:58:48 ac-xss02-m pgpool[23389]:
connect_inet_domain_socket_by_port: socket() failed: Too many open files
   Mar 28 23:58:48 ac-xss02-m pgpool[23389]: make_persistent_db_connection:
connection to <HOST>(5433) failed

I'm not sure if this is indicative of a resource leak somewhere, but that's
kind of what it smells like to me.

We had upgraded from a previous version of pgpool that did not have this
issue (pgpool-II-91-3.1.3-2.rhel6.x86_64.rpm); when using this version the
following of the master was almost instantaneous.

I am enclosing syslog output and config for the 2-node cluster with debug=5.

I appreciate any insight anyone has on this matter.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130401/09040217/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool.conf
Type: application/octet-stream
Size: 19603 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130401/09040217/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-syslog
Type: application/octet-stream
Size: 56469 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130401/09040217/attachment-0003.obj>

More information about the pgpool-general mailing list