[pgpool-general: 4635] Re: Node failure causes outage

James Sewell james.sewell at lisasoft.com
Mon Apr 11 09:05:09 JST 2016


Hello,

Can anyone comment on if this is expected behavior?

I can't seem to produce a configuration where a short network outage to the
standby node doesn't result in one of the following happening:

a) No new connections being accepted to the master until the node returns
b) (Eventually) The standby node being marked as status=3 and never being
returned to service

Is there a way round this?

It can be tested by setting up PGPool then running /sbin/iptables -A INPUT
-p tcp --destination-port 5432 -j DROP on the standby and trying to connect.

Cheers,




James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Tue, Mar 29, 2016 at 3:36 PM, James Sewell <james.sewell at lisasoft.com>
wrote:

> Hi All,
>
> I'm running pgpool-II version 3.3.4 (tokakiboshi) with two  9.3.1.3 nodes.
>
> I have the following in my config file:
>
> # - Backend Connection Settings -
>
> backend_hostname0 = '10.51.9.125'
> backend_port0 = 5432
> backend_weight0 = 1
> backend_data_directory0 = '/apps/pg_data/data'
> backend_flag0 = 'ALLOW_TO_FAILOVER'
>
> backend_hostname1 = '10.51.10.116'
> backend_port1 = 5432
> backend_weight1 = 1
> backend_data_directory1 = '/apps/pg_data/data'
> backend_flag1 = 'ALLOW_TO_FAILOVER'
>
> ...
>
> master_slave_mode = on
> master_slave_sub_mode = 'stream'
>
> ...
>
> health_check_period = 5
> health_check_timeout = 10
> health_check_max_retries = 5
> health_check_retry_delay = 1
>
> ...
>
> fail_over_on_backend_error = off
>
> This mostly works as expected. I am using a third party clustering
> solution which uses pcp_detach_node for fencing - although I don't think
> that is relevant here.
>
> I get the following node info when up:
>
> node_id |   hostname   | port | status | lb_weight |  role
> ---------+--------------+------+--------+-----------+---------
>  0       | 10.51.9.125  | 5432 | 2      | 0.500000  | primary
>  1       | 10.51.10.116 | 5432 | 3      | 0.500000  | standby
> (2 rows)
>
> The problem comes when my standby node becomes unavailable - let's say I
> shut it down.
>
> Now until all my health checks have failed I am not able to connect to the
> database:
>
> -bash-4.1$ psql
> psql.bin: -bash-4.1$
>
> My network is pretty temperamental, so I actually would like to lengthen
> the time to failure - but this means I get a complete outage every time my
> network flutters?
>
> I could also change the setting of fail_over_on_backend_error but then
> although the failure is immediate there is no coming back without
> intervention once my server is set to a status of 3.
>
> Is there a way of avoiding this behavior and allowing traffic to continue
> to be processed on my functioning master when the standby is in question?
>
> Cheers,
>
> James Sewell,
> PostgreSQL Team Lead / Solutions Architect
> ______________________________________
>
>
> Level 2, 50 Queen St, Melbourne VIC 3000
>
> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
>
>

-- 


------------------------------
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20160411/1f0758ab/attachment.html>


More information about the pgpool-general mailing list