[Pgpool-general] failover question

Tatsuo Ishii ishii at sraoss.co.jp
Wed May 11 01:37:44 UTC 2011


It might explain the problem you had. Pgpool-II sends query cancel
packet to the master (node id = 0) first, sleeps 1 second before
sending the cancel packet to other node. the sleep is neccesary to
ensure that the second node has enough time to execute the command to
be canceled(otherwise pgpool-II has to cancel a command which has not
executed yet).

If the second node is fast enough, it will send SELECT results back to
pgpool-II, which is carried in lots of series of "D" packets(a D
packet corresponds to one row) before cancel packet arrives.  So
pgpool-II sees "E" packet (which is caused by cancel request) in the
master node and D packet in the second node, which causes a failover.

Hm. It seems there's no easy way to solve the problem at the
moment. Any idea?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> I don't have problem with your test case either.  However, I don't
> think this is the case as what I mentioned.
> Under psql, I don't have FETCH_COUNT set, and when I do a select query
> on a big table it may take several seconds for the results to come
> back.  If I do a ^C before any results come back I'll get failover.
> Anyway, if I have the FETCH_COUNT set to 1000, then I'll see results
> always come back right away, in this case, the ^C, after some results
> displayed, does not cause problem.
> 
> Thanks,
> Gary
>> Yes, I have tried with pgpool-II 3.0.3 and PostgresSQL 9.0.3.  I have
>> created a small function which just idly sleeps for 20 seconds.
>>
>> create function bar() returns int as 'select pg_sleep(20);select 1'
>> language 'sql';
>>
>> Then I call the function and hit ^C from psql session. It worked as
>> expected.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>> Thanks for the response.  No, I did not change the pgpool.conf when I
>>> updated postgresql from 8.4 to 9.0.4.
>>> By the way, did you try this under your environment without problem ?
>>>
>>> Thanks,
>>> Gary
>>>> I don't think of any idea which explains why you have the problem with
>>>> 9.0 but you don't have with 8.4. Have you changed pgpool.conf?
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese: http://www.sraoss.co.jp
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm running pgpool2 3.0.1 with replication mode on two postgresql
>>>>> 9.0.4 db servers (updated from 8.4.4).  I have the 'replicate_select'
>>>>> and 'replication_stop_on_mismatch' both set to 'true'.
>>>>> I just noticed that under psql, if I cancel a long select sql (before
>>>>> any output displayed), I'll get the following message and the pgpool
>>>>> failed over.  I don't recall this happened with 8.4.4.  Can anyone
>>>>> explain why and how to avoid this ?
>>>>>
>>>>> [sd3ops1.ops1_admin].sd3ops1>   select * from file;
>>>>> Cancel request sent
>>>>> ERROR: kind mismatch among backends. Possible last query was: "select
>>>>> * from file;" kind details are: 0[E: canceling statement due to user
>>>>> request] 1[D]
>>>>> HINT:  check data consistency among db nodes
>>>>> server closed the connection unexpectedly
>>>>>           This probably means the server terminated abnormally
>>>>>           before or while processing the request.
>>>>> The connection to the server was lost. Attempting reset: Succeeded.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Gary
>>>>>
>>>>> _______________________________________________
>>>>> Pgpool-general mailing list
>>>>> Pgpool-general at pgfoundry.org
>>>>> http://pgfoundry.org/mailman/listinfo/pgpool-general
> 


More information about the Pgpool-general mailing list