[pgpool-general: 4119] Testing pgpool + failover on random servers restarts

Sat Oct 17 02:15:58 JST 2015

Hello,

I have setup pgpool (version 3.4) in master slave mode with 2 postgres
databases (no load balancing) using streaming replication:

backend_hostname0 = 'voldb1'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/data01/postgres'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'voldb2'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/data01/postgres'
backend_flag1 = 'ALLOW_TO_FAILOVER'

connection_cache = on
load_balance_mode = off

master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 10

health_check_period = 40
health_check_timeout = 10
health_check_max_retries = 3
health_check_retry_delay = 1
connect_timeout = 10000

failover_command = '/usr/local/bin/pgpool_failover.sh %P %m %H'
failback_command = '/usr/local/bin/pgpool_failback.sh %d %P %m %H'
fail_over_on_backend_error = on
search_primary_node_timeout = 10

I was testing the following scenario:

1. Initial setup:
 node_id |   hostname    | port | status | lb_weight |  role
---------+---------------+------+--------+-----------+---------
 0       | voldb1.ls.cbn | 5432 | 2      | 0.500000  | primary
 1       | voldb2.ls.cbn | 5432 | 2      | 0.500000  | standby

2. I stopped postgres on voldb1 and voldb2 became the new primary so now I
have this:
 node_id |   hostname    | port | status | lb_weight |  role
---------+---------------+------+--------+-----------+---------
 0       | voldb1.ls.cbn | 5432 | 3      | 0.500000  | standby
 1       | voldb2.ls.cbn | 5432 | 2      | 0.500000  | primary

3. I stopped postgres on voldb2 and there were no active masters attached
to pgpool:

pcp_node_info 10 localhost 9898 cbn_cluster t000r 0
voldb1.ls.cbn 5432 3 0.500000

pcp_node_info 10 localhost 9898 cbn_cluster t000r 1
voldb2.ls.cbn 5432 3 0.500000

4. I started postgres on both voldb1 and voldb2. No changes as pgpool does
not search for the nodes to reattach.

5. I killed pgpool (to simulate a server crash, reboot): pkill -9 pgpool
and I started pgpool again. Now this is the state I ended up with:
 node_id |   hostname    | port | status | lb_weight |  role
---------+---------------+------+--------+-----------+---------
 0       | voldb1.ls.cbn | 5432 | 2      | 0.500000  | primary
 1       | voldb2.ls.cbn | 5432 | 2      | 0.500000  | standby

So the old master became master again which is a big deal in my case, I
can't afford to have any kind of data loss!!!

How is pgpool determining what server was master last? Is there a way I
could overcome this issue? I thought I could update the pgpool status file
in /var/log/pgpool in my scripts once a failover occurs. Would that be the
way to go or there are better ways to fix this?

Thanks a lot,
Ioana Danes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20151016/8ebee357/attachment.html>