[Pgpool-general] "FATAL: invalid frontend message type 4"

Sat Jan 9 05:16:33 UTC 2010

Thanks for analysis.

>  I found out more about this situation. The 0x04 is always read after
> receiving a terminate ('X') message from the frontend, which makes me
> think that the frontend might be sending garbage after finishing the
> session, or somehow reading the length of the terminate message
> (0x00000004) fails in line 1529 of pool_proto_modules.c and leaves the
> 0x04 part in the read buffer (I don't really know how that might
> happen).

One possibility is, pool_read always tries to read READBUFSZ bytes
into read buffer and if client accidentaly sends garbabe following to
X packet, it is likely pgpool reads the garbabe in the next call to
pool_read. I'm going to emit log message in this case and clear read
buffer to guard from buggy client.

One weakness of this theory is, PostgreSQL should see 0x4 pakcet even
if you do not use pgpool, though.

>  As part of the termination of the session pool_process_query() gets
> called, who in turn calls reset_backend() in order to send the commands
> in reset_query_list. reset_backend() returns 1 and so the block of code
> which starts with the "else if" in line 136 of pool_process_query.c gets
> executed, setting the "state" variable with a value of 1. This is
> necessary so that the code in line 471 is executed after we read a
> ReadyForQuery ('Z') message from the backends and reset_backend() gets
> called again and the termination process can continue.
> 
>  The problem is that after the call to reset_backend(), but before
> attempting to read a new message from the backend, pgpool may attempt to
> read a new message from the frontend (lines 307 and 326) and read
> garbage, which is then forwarded to the backends.

I don't believe line 308 will get executed after the call to
reset_backend() and before attempting to read a new message from the
backend. Because at line 180 connection_reuse variable is checked.
(it is set by line 451 in child.c after receiving X packet from
frontend).

For 326, yes it could happen as you mentioned. But your patches for
this part:

+		else if (state == 1)
 		{
 			if (frontend->len > 0 && !in_progress)
 			{

will break pgpool to execute normal query processing, since if this
since following code needs to executed while processing normal
queries(other than reset queries).

>  I attach a patch that makes sure pgpool will not try to read from the
> frontend while the "state" variable is 0 (waiting for a ReadyForQuery
> message).
> 
>  I'm sorry about the long explanation, but I thought it was necessary to
> understand what the patch tries to accomplish. 
> 
> 
>  Regards
> 
> On Sun, 2010-01-03 at 21:11 +0900, Tatsuo Ishii wrote:
> > I looked into this but couldn't find why pgpool-II sent 4 to
> > PostgreSQL. Any test case to reproduce the error please.
> > 
> > In the mean time I start to think that pgpool-II should not trigger
> > failover if failed to read data from backend. Looking into PostgreSQL
> > I found lots of places where FATAL errors are generated and PostgreSQL
> > exits. If it exits, pgpool-II will detect a read error from socket
> > that will cause failover. I feel this behavior is too fragile. I'm
> > looking forward to tweak pool_stream.c so that it does not failover.
> > Comments?
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> > 
> > >  pgpool-II version is 2.2.4 compiled against libpq 8.1.15, and postgres
> > > version is 8.1.15.
> > > 
> > >  Both run on Debian etch amd64.
> > > 
> > >  I haven't been able to create a test case yet, but I've been able to
> > > verify that it always happens after a COMMIT (like my previous example)
> > > or a BEGIN.
> > > 
> > > 
> > >  Regards
> > > 
> > > On Wed, 2009-12-30 at 17:41 +0900, Tatsuo Ishii wrote:
> > > > PostgreSQL, pgpool-II version?
> > > > 
> > > > Do you have self contained test case?
> > > > --
> > > > Tatsuo Ishii
> > > > SRA OSS, Inc. Japan
> > > > 
> > > > >  On this same issue, I found that pgpool detected an error just before
> > > > > failing a backend as a response to a COMMIT (this is on a different
> > > > > setup than the one from my previous post). This is the relevant part
> > > > > from the pgpool log:
> > > > > 
> > > > > 
> > > > > 2009-12-26 00:13:04 DEBUG: pid 16329: ReadyForQuery: transaction state:
> > > > > T
> > > > > 2009-12-26 00:13:04 DEBUG: pid 16329: do_command: Query: COMMIT
> > > > > 2009-12-26 00:13:04 DEBUG: pid 16329: wait_for_query_response: waiting
> > > > > for backend 1 completing the query
> > > > > 2009-12-26 00:13:04 DEBUG: pid 16329: detect_error: kind: E
> > > > > 2009-12-26 00:13:04 DEBUG: pid 16329: do_command: kind: E
> > > > > 2009-12-26 00:13:04 ERROR: pid 16329: pool_read: read failed (Connection
> > > > > reset by peer)
> > > > > 2009-12-26 00:13:04 LOG:   pid 16329: notice_backend_error: 1 fail over
> > > > > request from pid 16329
> > > > > 
> > > > > 
> > > > >  I can see the commit on the postgres log of the failed backend,
> > > > > followed by the "invalid frontend message type 4" which closes the
> > > > > session and the connection to the client. That would explain why the
> > > > > pool_read call on pgpool fails, but I still can't figure out why does
> > > > > the backend receive that invalid message.
> > > > > 
> > > > > 
> > > > > 
> > > > >  Regards
> > > > > 
> > > > > 
> > > > > On Thu, Dec 24, 2009 at 11:38 AM, Xavier Noguer <xnoguer at antica.cl>
> > > > > wrote:
> > > > > >
> > > > > >  Hello,
> > > > > >
> > > > > >  I encountered a problem with a pgpool installation. Pgpool failed and
> > > > > > detached a backend, but I don't know the reason because that
> > > > > particular
> > > > > > pgpool server is not writing to its log (problem with cronolog, and I
> > > > > > can't restart pgpool since its a live system).
> > > > > >
> > > > > >  What I did find was a suspicious message in the postgres log of the
> > > > > > failed backend. Here are the relevant lines of the log:
> > > > > >
> > > > > >
> > > > > > 2009-12-23 15:07:20 CLST, sess LOG:  duration: 39.809 ms  statement:
> > > > > > EXECUTE <unnamed>  [PREPARE:  COMMIT]
> > > > > > 2009-12-23 15:07:20 CLST, sess FATAL:  invalid frontend message type 4
> > > > > > 2009-12-23 15:07:20 CLST, sess LOG:  unexpected EOF on client
> > > > > connection
> > > > > > 2009-12-23 15:07:20 CLST, sess LOG:  unexpected EOF on client
> > > > > connection
> > > > > > 2009-12-23 15:07:20 CLST, sess LOG:  unexpected EOF on client
> > > > > connection
> > > > > > 2009-12-23 15:07:20 CLST, sess LOG:  unexpected EOF on client
> > > > > connection
> > > > > > 2009-12-23 15:07:20 CLST, sess LOG:  unexpected EOF on client
> > > > > connection
> > > > > > 2009-12-23 15:07:20 CLST, sess LOG:  unexpected EOF on client
> > > > > connection
> > > > > > 2009-12-23 15:07:20 CLST, sess LOG:  unexpected EOF on client
> > > > > connection
> > > > > >
> > > > > >
> > > > > >  Since this happened at the same time as the failing of the backend
> > > > > I'm
> > > > > > assuming this condition made postgres close the connection to pgpool
> > > > > and
> > > > > > so triggered the failing and detaching of the backend.
> > > > > >
> > > > > >  Does anybody know what message type 4 is ?
> > > > > >
> > > > > >
> > > > > >  Regards
> > > > > >
> > > > > >
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > Pgpool-general mailing list
> > > > > Pgpool-general at pgfoundry.org
> > > > > http://pgfoundry.org/mailman/listinfo/pgpool-general
> > >