[Pgpool-hackers] retry of health check

Wed Mar 10 04:49:34 UTC 2010

On Wed, Mar 10, 2010 at 10:25 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> Hi,
>>
>> The health_check() in main.c retries to execute the health check from
>> the beginning only when the message other than the ErrorResponse
>> arrives from the backend and it fails in sending the Terminate message
>> to the backend. Why do we need to retry that only in that case?
>> The retry with template1 seems useless. Am I missing something?
>
> It turns out that current behavior seems to be correct, not perfect
> thought.
>
> While writing the patch, I thought that after sending startup packet
> with wrong user and/or datbase, backend returns E, rather than R. Here
> is a strace of psql with wrong database name:
>
> connect(3, {sa_family=AF_FILE, path="/tmp/.s.PGSQL.5433"}, 110) = 0
> getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> getsockname(3, {sa_family=AF_FILE, path=@}, [2]) = 0
> poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, -1) = 1
> rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
> send(3, "\0\0\0#\0\3\0\0user\0t-ishii\0database\0uu"..., 35, 0) = 35
> rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
> poll([{fd=3, events=POLLIN|POLLERR, revents=POLLIN|POLLHUP}], 1, -1) = 1
> recv(3, "R\0\0\0\10\0\0\0\0E\0\0\0QSFATAL\0C3D000\0Mdat"..., 16384, 0) = 91
>
> As you can see, backend returns "R" follwing "E". Reading from docs, I
> expected backend immediately returns "E" without "R". I'm not sure
> this follows the frontend/backend protocol but we have to live with
> it.

That depends on what kind of error happens. I guess that 'R' is returned
first if an error occurs before authentication phase, 'E' otherwise.

> Anyway, backend then takes the liberty to discconnect the session
> after sending above packet and pgpool sees write(2) error. Thus retry
> to use "template1" database is correct in this case.

I'm not sure if the retry using "template1" database can succeed after
write(2) returns an error. If the retry always fails, it's a redundant
operation, I think.

When write(2) to one of multiple postgres servers under pgpool fails,
the retry affects not only that one server but also all of them.
Is this behavior intentional?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center