[Pgpool-general] health check failed due an Interrupted system call

Mon Mar 1 14:52:42 UTC 2010

Thanks Tatsuo Ishii.

I do not have control on infrastructure. And the hosting service we
use do not report, this time, any network problem or work. The
PostgreSQL did not have any problem and was still running. The most
probable reason is then the network.

Christophe Philemotte

On Mon, Mar 1, 2010 at 3:28 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> Hi all,
>>
>> Yesterday, one of my backend node was degenerated because of a health
>> check fail.
>>
>> 010-02-28 18:13:41 ERROR: pid 23996: health check failed during read.
>> host X at port 5432 is down. reason: Interrupted system call
>> 2010-02-28 18:14:04 LOG:   pid 23996: set 0 th backend down status
>> 2010-02-28 18:14:04 LOG:   pid 23996: starting degeneration. shutdown
>> host X(5432)
>> 2010-02-28 18:14:07 LOG:   pid 23996: failover_handler: set new master
>> node: 1
>> 2010-02-28 18:14:07 LOG:   pid 23996: failover done. shutdown host X(5432)
>>
>> I've looked in the source code which system call is (in main.c pgpool-II
>> 2.3.2.1). It seems that its is read().
>>
>> It is the first time I notice this kind of error. Usually, it a timed
>> out connection one. So, I'm not confident about reproducing it.
>>
>> How can I interpret this error? Is it simply because read() was blocked?
>
> Health check failed because PostgreSQL does not reply within
> health_check_timeout seconds. Please check PostgreSQL log. If you
> don't find anything strange there, probably the cause is network
> physical problem. Maybe switch or hub hardware problem?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>