[pgpool-hackers: 3458] health check timeout does work in certain case

Wed Oct 16 15:09:10 JST 2019

I have been playing with health check and found that it does not work in certan case.

I sent SIGSTOP to one of backend node's postmaster process to freeze
it. I was expecting health check process detects it with health check
timer expired. However the health check process wait forever here:

(gdb) bt
#0  0x00007f094a7a234e in __libc_read (fd=6, 
    buf=buf at entry=0x564a3aa3a2c0 <readbuf>, nbytes=nbytes at entry=1024)
    at ../sysdeps/unix/sysv/linux/read.c:27
#1  0x0000564a3a68dd70 in read (__nbytes=1024, __buf=0x564a3aa3a2c0 <readbuf>, 
    __fd=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2  pool_read (cp=cp at entry=0x7f094adf2268, buf=buf at entry=0x7fff6c221786, 
    len=len at entry=1) at utils/pool_stream.c:194
#3  0x0000564a3a68e101 in pool_read_with_error (cp=0x7f094adf2268, 
    buf=buf at entry=0x7fff6c221786, len=len at entry=1, 
    err_context=err_context at entry=0x564a3a700c90 "authentication message response type") at utils/pool_stream.c:141
#4  0x0000564a3a649761 in connection_do_auth (cp=cp at entry=0x564a3c22a640, 
    password=password at entry=0x564a3c22a5f0 "md5a16f9d87e344969ec59de417447348b3") at auth/pool_auth.c:104
#5  0x0000564a3a6565e8 in make_persistent_db_connection (
    db_node_id=db_node_id at entry=1, 
    hostname=hostname at entry=0x7f094ae0b280 "/tmp", port=port at entry=11003, 
    dbname=dbname at entry=0x564a3c21a4a8 "postgres", 
    user=user at entry=0x564a3c21b7a8 "t-ishii", 
    password=password at entry=0x564a3c22a5f0 "md5a16f9d87e344969ec59de417447348b3", retry=0 '\000') at protocol/child.c:1440
#6  0x0000564a3a65670d in make_persistent_db_connection_noerror (
    db_node_id=db_node_id at entry=1, 
---Type <return> to continue, or q <return> to quit

The stack #2 is here in pool_stream.c:

			readlen = read(cp->fd, readbuf, READBUFSZ);

Actually read(2) was once interrupted by ALARM as expected but later
on it called read(2) again and stuck there this time because of this
code.

			if (errno == EINTR || errno == EAGAIN)
			{
				ereport(DEBUG5,
						(errmsg("read on socket failed with error :\"%s\"", strerror(errno)),
						 errdetail("retrying...")));
				continue;
			}

As far as I remember, in all cases except health check read(2) should
retry and I would like to propose attached patch to fix the
issue. Comments are welcome.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: health_check.diff
Type: text/x-patch
Size: 540 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20191016/9b23c989/attachment.bin>