[Pgpool-general] pgpool 2.2.4: DEALLOCATED children

Xavier Noguer xnoguer at antica.cl
Fri Sep 25 07:35:57 UTC 2009


 Tatsuo,

 I think we found what the problem was. During the reset of a backend
the pgpool process send a BEGIN command to start a transaction and
expects to receive a message kind 'N', 'E', 'C' or 'Z', but in our
case the backend sends something different ( 0x0049050000 ). The
process interprets part of what it received as the length of the data
it needs to read from the backend, and so blocks itself indefinitely
while waiting to read that much data.

 I don't know what it is that the backend is sending, but it seems to
be always the same data (0x0049050000), and the first byte of it is
not any known message kind ('N', 'E', 'C', etc...).

 I've attached a patch which aborts the reset operation if what was
read from the backend is none of the expected message kinds.

 We also have some logs which might make it easier to understand the
code flow in case you want to examine them.

 Cheers


On Thu, Sep 24, 2009 at 9:41 AM, Xavier Noguer <xnoguer at antica.cl> wrote:
>  Tatsuo,
>
>  Our test case was this:  two backends running postgres 8.1; a few
> differences between them, with the master node always having more
> registers.
>
>  We tried to reproduce the effect on our development environment, but
> it didn't work the first time. I'll try again to see if I can provide
> you with the necessary database dumps to reproduce it.
>
>  Cheers
>
> On Thu, Sep 24, 2009 at 4:05 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> Thanks for investigation.
>>
>> But I could not reproduce Agustín's problem. I ran test/jdbc for
>> testing. If you have a self contained test case, please let me know. I
>> would like to know why my patches did not work and should help me in
>> future bug shooting.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>>
>>>  Hello Tatsuo,
>>>
>>>  I'm working with Agustín Almonte on this same issue, and after trying
>>> the latest patch you provided we realized that when a DEALLOCATE was
>>> being sent for a prepared statement, that prepared statement was not
>>> being taken off prepared_list. This meant that prepared_list was not
>>> updated and the same DEALLOCATE was sent over and over again.
>>>
>>>  Attached you'll find a patch that takes the prepared statement off
>>> prepared_list after having sent the DEALLOCATE for that prepared
>>> statement. We tested it and it seems to work fine.
>>>
>>>  Cheers
>>
>

--- pool_process_query.c	2009-09-24 01:56:59.000000000 -0400
+++ pool_process_query.c.new	2009-09-25 03:00:23.000000000 -0400
@@ -2619,6 +2619,12 @@
 				return POOL_END;
 			}
 			len = ntohl(len) - 4;
+			
+			if (kind != 'N' && kind != 'E' && kind != 'C')
+			{
+				pool_error("do_command: error, kind is not N, E or C");
+				return POOL_END;
+			}
 			string = pool_read2(backend, len);
 			if (string == NULL)
 			{
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch_pgpool_20090925.diff
Type: application/octet-stream
Size: 422 bytes
Desc: not available
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20090925/7fff163b/attachment.obj>


More information about the Pgpool-general mailing list