[Pgpool-general] pgpool 2.2.4: DEALLOCATED children

Tatsuo Ishii ishii at sraoss.co.jp
Fri Sep 25 08:00:00 UTC 2009


Xavier,

Thanks for analyzing and patches! I don't know what 0x0049050000 is
either. Can you send me the log?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

>  Tatsuo,
> 
>  I think we found what the problem was. During the reset of a backend
> the pgpool process send a BEGIN command to start a transaction and
> expects to receive a message kind 'N', 'E', 'C' or 'Z', but in our
> case the backend sends something different ( 0x0049050000 ). The
> process interprets part of what it received as the length of the data
> it needs to read from the backend, and so blocks itself indefinitely
> while waiting to read that much data.
> 
>  I don't know what it is that the backend is sending, but it seems to
> be always the same data (0x0049050000), and the first byte of it is
> not any known message kind ('N', 'E', 'C', etc...).
> 
>  I've attached a patch which aborts the reset operation if what was
> read from the backend is none of the expected message kinds.
> 
>  We also have some logs which might make it easier to understand the
> code flow in case you want to examine them.
> 
>  Cheers
> 
> 
> On Thu, Sep 24, 2009 at 9:41 AM, Xavier Noguer <xnoguer at antica.cl> wrote:
> >  Tatsuo,
> >
> >  Our test case was this:  two backends running postgres 8.1; a few
> > differences between them, with the master node always having more
> > registers.
> >
> >  We tried to reproduce the effect on our development environment, but
> > it didn't work the first time. I'll try again to see if I can provide
> > you with the necessary database dumps to reproduce it.
> >
> >  Cheers
> >
> > On Thu, Sep 24, 2009 at 4:05 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >> Thanks for investigation.
> >>
> >> But I could not reproduce Agustín's problem. I ran test/jdbc for
> >> testing. If you have a self contained test case, please let me know. I
> >> would like to know why my patches did not work and should help me in
> >> future bug shooting.
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >>
> >>>  Hello Tatsuo,
> >>>
> >>>  I'm working with Agustín Almonte on this same issue, and after trying
> >>> the latest patch you provided we realized that when a DEALLOCATE was
> >>> being sent for a prepared statement, that prepared statement was not
> >>> being taken off prepared_list. This meant that prepared_list was not
> >>> updated and the same DEALLOCATE was sent over and over again.
> >>>
> >>>  Attached you'll find a patch that takes the prepared statement off
> >>> prepared_list after having sent the DEALLOCATE for that prepared
> >>> statement. We tested it and it seems to work fine.
> >>>
> >>>  Cheers
> >>
> >
> 
> --- pool_process_query.c	2009-09-24 01:56:59.000000000 -0400
> +++ pool_process_query.c.new	2009-09-25 03:00:23.000000000 -0400
> @@ -2619,6 +2619,12 @@
>  				return POOL_END;
>  			}
>  			len = ntohl(len) - 4;
> +			
> +			if (kind != 'N' && kind != 'E' && kind != 'C')
> +			{
> +				pool_error("do_command: error, kind is not N, E or C");
> +				return POOL_END;
> +			}
>  			string = pool_read2(backend, len);
>  			if (string == NULL)
>  			{


More information about the Pgpool-general mailing list