[Pgpool-general] pgpool 2.2.4: DEALLOCATED children

Fri Sep 25 08:11:11 UTC 2009

Hi Tatsuo,

filtered logs are attached.

Can you validate the patches applied?

Thanks,
Agustín Almonte F.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool_pid11723.log
Type: application/octet-stream
Size: 40815 bytes
Desc: not available
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20090925/b8c2b6f6/attachment-0001.obj>
-------------- next part --------------

El 25-09-2009, a las 4:00, Tatsuo Ishii escribió:

> Xavier,
>
> Thanks for analyzing and patches! I don't know what 0x0049050000 is
> either. Can you send me the log?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
>> Tatsuo,
>>
>> I think we found what the problem was. During the reset of a backend
>> the pgpool process send a BEGIN command to start a transaction and
>> expects to receive a message kind 'N', 'E', 'C' or 'Z', but in our
>> case the backend sends something different ( 0x0049050000 ). The
>> process interprets part of what it received as the length of the data
>> it needs to read from the backend, and so blocks itself indefinitely
>> while waiting to read that much data.
>>
>> I don't know what it is that the backend is sending, but it seems to
>> be always the same data (0x0049050000), and the first byte of it is
>> not any known message kind ('N', 'E', 'C', etc...).
>>
>> I've attached a patch which aborts the reset operation if what was
>> read from the backend is none of the expected message kinds.
>>
>> We also have some logs which might make it easier to understand the
>> code flow in case you want to examine them.
>>
>> Cheers
>>
>>
>> On Thu, Sep 24, 2009 at 9:41 AM, Xavier Noguer <xnoguer at antica.cl>  
>> wrote:
>>>  Tatsuo,
>>>
>>>  Our test case was this:  two backends running postgres 8.1; a few
>>> differences between them, with the master node always having more
>>> registers.
>>>
>>>  We tried to reproduce the effect on our development environment,  
>>> but
>>> it didn't work the first time. I'll try again to see if I can  
>>> provide
>>> you with the necessary database dumps to reproduce it.
>>>
>>>  Cheers
>>>
>>> On Thu, Sep 24, 2009 at 4:05 AM, Tatsuo Ishii <ishii at sraoss.co.jp>  
>>> wrote:
>>>> Thanks for investigation.
>>>>
>>>> But I could not reproduce Agustín's problem. I ran test/jdbc for
>>>> testing. If you have a self contained test case, please let me  
>>>> know. I
>>>> would like to know why my patches did not work and should help me  
>>>> in
>>>> future bug shooting.
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>>
>>>>>  Hello Tatsuo,
>>>>>
>>>>>  I'm working with Agustín Almonte on this same issue, and after  
>>>>> trying
>>>>> the latest patch you provided we realized that when a DEALLOCATE  
>>>>> was
>>>>> being sent for a prepared statement, that prepared statement was  
>>>>> not
>>>>> being taken off prepared_list. This meant that prepared_list was  
>>>>> not
>>>>> updated and the same DEALLOCATE was sent over and over again.
>>>>>
>>>>>  Attached you'll find a patch that takes the prepared statement  
>>>>> off
>>>>> prepared_list after having sent the DEALLOCATE for that prepared
>>>>> statement. We tested it and it seems to work fine.
>>>>>
>>>>>  Cheers
>>>>
>>>
>>
>> --- pool_process_query.c	2009-09-24 01:56:59.000000000 -0400
>> +++ pool_process_query.c.new	2009-09-25 03:00:23.000000000 -0400
>> @@ -2619,6 +2619,12 @@
>> 				return POOL_END;
>> 			}
>> 			len = ntohl(len) - 4;
>> +			
>> +			if (kind != 'N' && kind != 'E' && kind != 'C')
>> +			{
>> +				pool_error("do_command: error, kind is not N, E or C");
>> +				return POOL_END;
>> +			}
>> 			string = pool_read2(backend, len);
>> 			if (string == NULL)
>> 			{