[pgpool-general: 6479] Re: pgpool-II does nothing when slave node goes down

Wed Mar 27 19:23:45 JST 2019

Dimitri,
I wrote a complete walk-through of pgpool on centos here, http://saule1508.github.io/pgpool/, near the end you can see the parameters I use for pgpool and also the scripts that I use for failover and for follow_master

To automate the failover, the important parameters (in my case) are (just a copy/paste from the link I refer above)

 failover_command = '/opt/pgpool/scripts/failover.sh  %d %h %P %m %H %R'
# not used, just echo something
failback_command = 'echo failback %d %h %p %D %m %H %M %P'
failover_on_backend_error = 'off'
search_primary_node_timeout = 300
# Mandatory in a 3 nodes set-up
follow_master_command = '/opt/pgpool/scripts/follow_master.sh %d %h %m %p %H %M %P'

# grace period before triggering a failover
health_check_period = 40
health_check_timeout = 10
health_check_user = 'hcuser'
health_check_password = 'hcuser'
health_check_database = 'postgres'
health_check_max_retries = 3
health_check_retry_delay = 1Question 1: pgpool execute the failover_command in case there is a failure of the master or a failure of one of the standby (at least when health checks are used). In the failover script you will usually wants to check (based on the arguments) what to do: if the failing host is the current primary then you would promote the standby otherwise the script does nothing
When I say pgpool tries to reconnect x times, it is because I use the health_check parameters; at the contrary, If health_check_period = 0 then the health check is disabled and in this case pgpool will not do the failover unless you use the parameter failover_on_backend_errors (so you must choose either the health check or the failover_on_backend_error, but not both)
Question 2: it can be that without the healtch check the standby is not detached automatically ? I am not sure. In this case you would have to do pcp_detach_node yourself in the failover script

Question 3: the parameter failover_on_backend_error is not very intuitive. If you set it to on, what it does is that the failover will be triggered when a child process detects the error on the postgres connection. But if you decide to use the health check then set it to no, otherwise the health check might not be respected.
You must follow in the logs of pgpool what is going on. On Centos pgpool is started via systemd and so the logs are available in the journal:sudo journalctl --unit pgpool
Depending on the version of centos it might be also in /var/log/messages otherwise you might have to configure rsyslog to capture logs and store them in /var/log/messages.

Pierre 

    On Wednesday, March 27, 2019, 10:36:23 AM GMT+1, Dmitry Medvedev <dm.dm.medvedev at gmail.com> wrote:  

 Thnx for reply. I have some questions.
Question #1: failover script when stand-by fail. 
In pgpool.conf there is parameterfailover_command = '' # Executes this command at failoverBut there is no indication what exactly fail means - fail of master-server or fail of stand-by server?You wrote "When the standby is stopped, pgpool tries to reconnect x times ... then it does the failover script..." Where in pgpool.conf you defined this script?
Question #2: you wrote "... it detaches the standby".I did exactly the same setting as you wrote, but pgpool stil showing stand-by as "up" meanwhile it is "down".Mayby there is some parameter which exactly tells that stand-by MUST be detached when it fails?
Question #3: you wrote "I  also put the parameter failover_on_backend_error = 'off' "Why? If "off" than when master fail occurs - no stand-by promotes. Or I wrong?
вт, 26 мар. 2019 г. в 23:16, Pierre Timmermans <ptim007 at yahoo.com>:

Hi Dimitri
Did you set-up the health checks ?
In my pgpool config I have the following parameters related to the health checkshealth_check_period = 10
health_check_timeout = 10
health_check_user = 'hcuser'
health_check_password = 'hcuser'
health_check_database = 'postgres'
health_check_max_retries = 5
health_check_retry_delay = 1I  also put the parameter failover_on_backend_error = 'off' And I created the user hcuser on the postgres database
When the standby is stopped, pgpool tries to reconnect x times (depending on health_check_max_retries parameter) then it does the failover script (the script does not have to do something because it is not the primary failing) and it detaches the standby. When the standby is started again, pgpool does a failback script (I do nothing in this script) and then it attaches the standby again
I believe that if you don't set-up the health-checks pgpool does not detach the standby when it fails (not sure if it is as-designed or not)
Pierre 

     On Tuesday, March 26, 2019, 3:58:52 PM GMT+1, Dmitry Medvedev <dm.dm.medvedev at gmail.com> wrote:  

 Hello to everyone.
A couple of days I've spent trying to understand how pgpool-II works.Tell me, please, principle of operation when slave node goes down.I've read tons of manuals and when master node in my test cluster goes down (or master's network interface goes down) - pgpool does failover actions. Everything is OK and works as expected.But when slave node goes down - pgpool does nothing at all and I receive such answer
test=# show pool_nodes; node_id |  hostname   | port | status | lb_weight |  role  | select_cnt | load_balance_node | replication_delay | last_status_change---------+-------------+------+--------+-----------+--------+------------+-------------------+-------------------+--------------------- 0       | 172.28.30.6 | 5434 | up     | 0.500000  | master | 3          | true              | 0                 | 2019-03-26 17:04:12 1       | 172.28.30.7 | 5434 | up     | 0.500000  | slave  | 0          | false             | 0                 | 2019-03-26 17:04:12(2 rows)

At the same time slave node is completely down. Not "up"! select * from pg_stat_replication showing (0 rows)
Why does pgpool-II does nothing when slave node goes down and how I can change it?_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net
http://www.pgpool.net/mailman/listinfo/pgpool-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190327/c2873091/attachment-0001.html>