[pgpool-general: 1298] Re: pgpool stopped accepting client connections after 1 node hung

Tatsuo Ishii ishii at postgresql.org
Tue Jan 8 08:01:03 JST 2013


> On Mon, Jan 7, 2013 at 12:11 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>> I'm running pgpool-3.2.1 on a Linux-x86_64 server.  Late last night,
>>> one of my 3 postgresql-9.2.2 servers controlled by pgpool hung (not
>>> the master), and pgpool stopped accepting client connections for no
>>> apparent reason.  In the pgpool log at the time of the hang, I see the
>>> following errors generated thousands of times repeatedly:
>>>
>>> 2013-01-05 20:08:47 ERROR: pid 31413: connect_inet_domain_socket:
>>> connect() failed: Connection timed out
>>> 2013-01-05 20:08:47 ERROR: pid 31413: connection to cuda-db5(5432) failed
>>> 2013-01-05 20:08:47 ERROR: pid 31413: new_connection: create_cp() failed
>>> 2013-01-05 20:08:47 ERROR: pid 9476: connect_inet_domain_socket:
>>> connect() failed: Connection timed out
>>> 2013-01-05 20:08:47 ERROR: pid 9476: connection to cuda-db5(5432) failed
>>> 2013-01-05 20:08:47 ERROR: pid 9476: new_connection: create_cp() failed
>>> 2013-01-05 20:08:47 ERROR: pid 7850: connect_inet_domain_socket:
>>> connect() failed: Connection timed out
>>> 2013-01-05 20:08:47 ERROR: pid 7850: connection to cuda-db5(5432) failed
>>> 2013-01-05 20:08:47 ERROR: pid 7850: new_connection: create_cp() failed
>>>
>>> I don't understand why pgpool stopped accepting client connections.
>>> I'd expect that if any single node goes down, pgpool should continue
>>> to work and accept connections, and simply mark the unresponsive node
>>> as unavailable.
>>
>> That is my question too. Do you see this kind of message in the pgpool log?
>>
>>                 degenerate_backend_set: 2 fail over request from pid xxxx
>>
>> If you see this, pgpool should initiate the failover and mark cuda-db5 down.
> 
> Nope, that message was not present at any time.

There was a bug report regarding pgpool-II 3.2 (or higher)'s
connect_inet_domain_socket():
http://www.pgpool.net/mantisbt/view.php?id=46

In the report the error message was same as you
(connect_inet_domain_socket: connect() failed: Connection timed out).
and I have created a patch to fix it:
http://www.pgpool.net/mantisbt/file_download.php?file_id=55&type=bug

Can you try it out? Still I am investigating why you did not see fail
over but I think you want to try the patch to avoid the error first.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


More information about the pgpool-general mailing list