[Pgpool-hackers] retry of health check

Wed Mar 10 06:56:57 UTC 2010

> On Wed, Mar 10, 2010 at 10:25 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >> Hi,
> >>
> >> The health_check() in main.c retries to execute the health check from
> >> the beginning only when the message other than the ErrorResponse
> >> arrives from the backend and it fails in sending the Terminate message
> >> to the backend. Why do we need to retry that only in that case?
> >> The retry with template1 seems useless. Am I missing something?
> >
> > It turns out that current behavior seems to be correct, not perfect
> > thought.
> >
> > While writing the patch, I thought that after sending startup packet
> > with wrong user and/or datbase, backend returns E, rather than R. Here
> > is a strace of psql with wrong database name:
> >
> > connect(3, {sa_family=AF_FILE, path="/tmp/.s.PGSQL.5433"}, 110) = 0
> > getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> > getsockname(3, {sa_family=AF_FILE, path=@}, [2]) = 0
> > poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, -1) = 1
> > rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
> > send(3, "\0\0\0#\0\3\0\0user\0t-ishii\0database\0uu"..., 35, 0) = 35
> > rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
> > poll([{fd=3, events=POLLIN|POLLERR, revents=POLLIN|POLLHUP}], 1, -1) = 1
> > recv(3, "R\0\0\0\10\0\0\0\0E\0\0\0QSFATAL\0C3D000\0Mdat"..., 16384, 0) = 91
> >
> > As you can see, backend returns "R" follwing "E". Reading from docs, I
> > expected backend immediately returns "E" without "R". I'm not sure
> > this follows the frontend/backend protocol but we have to live with
> > it.
> 
> That depends on what kind of error happens. I guess that 'R' is returned
> first if an error occurs before authentication phase, 'E' otherwise.

I couldn't find any clear rule in the docs regarding the first
response from backend. It seems it truely depends on implementaion.

> > Anyway, backend then takes the liberty to discconnect the session
> > after sending above packet and pgpool sees write(2) error. Thus retry
> > to use "template1" database is correct in this case.
> 
> I'm not sure if the retry using "template1" database can succeed after
> write(2) returns an error. If the retry always fails, it's a redundant
> operation, I think.

Of course the retry will success if backend has template1 but has not
postgres database. I have tested with such a old version of PostgreSQL.

> When write(2) to one of multiple postgres servers under pgpool fails,
> the retry affects not only that one server but also all of them.
> Is this behavior intentional?

I can't think of any better way to detect backend failure than as it
is in current implementation. Do you have better idea?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp