[pgpool-general: 1525] Re: SOLVED - Failover problem when slave is de-attached

Mon Mar 25 17:21:22 JST 2013

Glad to hear that. Thanks for the report!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hi,
> 
> Installing the latest version fixes the problem.
> Thank you.
> 
> Regards,
> Stelios
> 
> -----Original Message-----
> From: Tatsuo Ishii [mailto:ishii at postgresql.org] 
> Sent: 06 February 2013 09:54
> To: Stelios Limnaios
> Cc: pgpool-general at pgpool.net
> Subject: Re: [pgpool-general: 1373] Failover problem when slave is de-attached
> 
> Hi,
> 
> Can you please try the latest V3.2_STABLE snapshot?
> 
> I suspect this:
> http://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=984265c8a7b69cf41149f1ad10ce73f959d8dfc2
> is related to your problem.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> 
>> Hi all,
>> 
>> I'm kindly asking for your expertise on a problem we have with PGPool and failover.
>> 
>> Our setup contains two database servers, one running as primary and one as slave.
>> We installed automatic failover and it seems to be working very well when the primary node goes down.
>> The standby server is promoted to primary and our java application continues working with no problems.
>> 
>> The problem that we have occurs when a slave node goes down.
>> PGPool starts being busy and PGPoolAdmin does not open any pages but it keeps waiting server responses for long (until timeout).
>> When I try to stop pgpool from the command line on the server, it just won't do it, but it keeps stopping it forever.
>> If I start again the failed slave, I can see in postgres logs that it connects to the primary node for replication, but I'm not able to open the PGPoolAdmin status page to click the Return button.
>> PGPool seems to be waiting for something, but I'm not able to understand what is it.
>> 
>> We use pgpool2-V3_2_STABLE, checked out from the repository (4-Oct-2012).
>> 
>> In pgpool.conf we have set
>> fail_over_on_backend_error:    /usr/local/etc/failover.sh %d "%h" %p %D %m %M "%H" %P
>> 
>> and in failover.sh:
>> if [ $failed_node_id = $old_primary_node_id ];then      # master failed
>>     touch $trigger   # let standby take over
>>     echo "Primary database "$failed_host_name" failed, please check 
>> the status of your replication system. Trigger used: "$trigger | mail -s "Ditaweb primary database failed" $admin_email else
>>     echo "Slave database "$failed_host_name" failed, please check the 
>> status of your replication system." | mail -s "Ditaweb slave database 
>> failed" $admin_email fi
>> 
>> The above script works fine as the 'slave failed' email is sent, and in the case when the primary node goes down failover is executed successfully.
>> I've also attached pgpool.conf in case you need it.
>> 
>> So, I guess the question is what makes PGPool having this behavior?
>> Is it something that we need to setup in pgpool.conf, some kind of timeout?
>> 
>> Thank you in advance for you time,
>> 
>> Regards,
>> Stelios Limnaios