[pgpool-general: 1373] Failover problem when slave is de-attached

Stelios Limnaios stelios.limnaios at mekon.com
Wed Feb 6 18:34:07 JST 2013


Hi all,

I'm kindly asking for your expertise on a problem we have with PGPool and failover.

Our setup contains two database servers, one running as primary and one as slave.
We installed automatic failover and it seems to be working very well when the primary node goes down.
The standby server is promoted to primary and our java application continues working with no problems.

The problem that we have occurs when a slave node goes down.
PGPool starts being busy and PGPoolAdmin does not open any pages but it keeps waiting server responses for long (until timeout).
When I try to stop pgpool from the command line on the server, it just won't do it, but it keeps stopping it forever.
If I start again the failed slave, I can see in postgres logs that it connects to the primary node for replication, but I'm not able to open the PGPoolAdmin status page to click the Return button.
PGPool seems to be waiting for something, but I'm not able to understand what is it.

We use pgpool2-V3_2_STABLE, checked out from the repository (4-Oct-2012).

In pgpool.conf we have set
fail_over_on_backend_error:    /usr/local/etc/failover.sh %d "%h" %p %D %m %M "%H" %P

and in failover.sh:
if [ $failed_node_id = $old_primary_node_id ];then      # master failed
    touch $trigger   # let standby take over
    echo "Primary database "$failed_host_name" failed, please check the status of your replication system. Trigger used: "$trigger | mail -s "Ditaweb primary database failed" $admin_email
else
    echo "Slave database "$failed_host_name" failed, please check the status of your replication system." | mail -s "Ditaweb slave database failed" $admin_email
fi

The above script works fine as the 'slave failed' email is sent, and in the case when the primary node goes down failover is executed successfully.
I've also attached pgpool.conf in case you need it.

So, I guess the question is what makes PGPool having this behavior?
Is it something that we need to setup in pgpool.conf, some kind of timeout?

Thank you in advance for you time,

Regards,
Stelios Limnaios
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130206/9738f825/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool.conf
Type: application/octet-stream
Size: 25937 bytes
Desc: pgpool.conf
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130206/9738f825/attachment-0001.obj>


More information about the pgpool-general mailing list