[pgpool-general: 4120] Re: Testing pgpool + failover on random servers restarts

Sat Oct 17 08:05:51 JST 2015

> Hello,
> 
> I have setup pgpool (version 3.4) in master slave mode with 2 postgres
> databases (no load balancing) using streaming replication:
> 
> backend_hostname0 = 'voldb1'
> backend_port0 = 5432
> backend_weight0 = 1
> backend_data_directory0 = '/data01/postgres'
> backend_flag0 = 'ALLOW_TO_FAILOVER'
> 
> backend_hostname1 = 'voldb2'
> backend_port1 = 5432
> backend_weight1 = 1
> backend_data_directory1 = '/data01/postgres'
> backend_flag1 = 'ALLOW_TO_FAILOVER'
> 
> connection_cache = on
> load_balance_mode = off
> 
> master_slave_mode = on
> master_slave_sub_mode = 'stream'
> 
> sr_check_period = 10
> 
> health_check_period = 40
> health_check_timeout = 10
> health_check_max_retries = 3
> health_check_retry_delay = 1
> connect_timeout = 10000
> 
> failover_command = '/usr/local/bin/pgpool_failover.sh %P %m %H'
> failback_command = '/usr/local/bin/pgpool_failback.sh %d %P %m %H'
> fail_over_on_backend_error = on
> search_primary_node_timeout = 10
> 
> 
> I was testing the following scenario:
> 
> 1. Initial setup:
>  node_id |   hostname    | port | status | lb_weight |  role
> ---------+---------------+------+--------+-----------+---------
>  0       | voldb1.ls.cbn | 5432 | 2      | 0.500000  | primary
>  1       | voldb2.ls.cbn | 5432 | 2      | 0.500000  | standby
> 
> 
> 2. I stopped postgres on voldb1 and voldb2 became the new primary so now I
> have this:
>  node_id |   hostname    | port | status | lb_weight |  role
> ---------+---------------+------+--------+-----------+---------
>  0       | voldb1.ls.cbn | 5432 | 3      | 0.500000  | standby
>  1       | voldb2.ls.cbn | 5432 | 2      | 0.500000  | primary
> 
> 3. I stopped postgres on voldb2 and there were no active masters attached
> to pgpool:
> 
> pcp_node_info 10 localhost 9898 cbn_cluster t000r 0
> voldb1.ls.cbn 5432 3 0.500000
> 
> pcp_node_info 10 localhost 9898 cbn_cluster t000r 1
> voldb2.ls.cbn 5432 3 0.500000
> 
> 4. I started postgres on both voldb1 and voldb2. No changes as pgpool does
> not search for the nodes to reattach.
> 
> 5. I killed pgpool (to simulate a server crash, reboot): pkill -9 pgpool
> and I started pgpool again. Now this is the state I ended up with:
>  node_id |   hostname    | port | status | lb_weight |  role
> ---------+---------------+------+--------+-----------+---------
>  0       | voldb1.ls.cbn | 5432 | 2      | 0.500000  | primary
>  1       | voldb2.ls.cbn | 5432 | 2      | 0.500000  | standby
> 
> 
> So the old master became master again which is a big deal in my case, I
> can't afford to have any kind of data loss!!!
> 
> How is pgpool determining what server was master last?

See the FAQ:

http://pgpool.net/mediawiki/index.php/FAQ#How_does_pgpool-II_find_the_primary_node.3F

> Is there a way I
> could overcome this issue? I thought I could update the pgpool status file
> in /var/log/pgpool in my scripts once a failover occurs. Would that be the
> way to go or there are better ways to fix this?

The situation above heavily relies on how you wrote pgpool_failover.sh
and pgpool_failback.sh. Also you'd better present how you start
pgpool. The point here is, you use -D or not.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp