[pgpool-general: 218] problem failover in streaming replication

Евгений Селявка evg.selyavka at gmail.com
Thu Feb 9 18:25:37 JST 2012


Dear users,

I have problem with my pgpool-II version 3.1.2 (hatsuiboshi). I have 2
database 0 - master and 1-slave this DB version PostgreSQL 9.1.2. This DB
in sync master slave mode. I try to failvoer and failback to slave and
master several times. I encountered with problem:

first step i shutdown master(0) - ok:

Feb  9 10:44:54 dbbalancer1 pgpool[2599]: set 0 th backend down status
Feb  9 10:44:54 dbbalancer1 pgpool[2599]: starting degeneration. shutdown
host 192.168.56.21(5432)
Feb  9 10:44:54 dbbalancer1 pgpool[2599]: Restart all children
Feb  9 10:44:54 dbbalancer1 pgpool[2599]: execute command:
/usr/local/etc/pgpool.d/failover.sh 0 192.168.56.21 5432
/var/lib/postgresql/9.1/main 1 0 192.168.56.22 0
Feb  9 10:44:55 dbbalancer1 pgpool[2599]: find_primary_node_repeatedly:
waiting for finding a primary node
Feb  9 10:44:58 dbbalancer1 pgpool[2599]: find_primary_node: primary node
id is 1
Feb  9 10:44:58 dbbalancer1 pgpool[2599]: starting follow degeneration.
shutdown host 192.168.56.21(5432)
Feb  9 10:44:58 dbbalancer1 pgpool[2599]: failover: 1 follow backends have
been degenerated
Feb  9 10:44:58 dbbalancer1 pgpool[2599]: failover: set new primary node: 1
Feb  9 10:44:58 dbbalancer1 pgpool[2599]: failover: set new master node: 1
Feb  9 10:44:58 dbbalancer1 pgpool[6156]: start triggering follow command.
Feb  9 10:44:58 dbbalancer1 pgpool[6156]: execute command:
/usr/local/etc/pgpool.d/follow_master_command.sh 0 192.168.56.21 5432
/var/lib/postgresql/9.1/main 1 192.168.56.22 0 0
Feb  9 10:44:58 dbbalancer1 pgpool[2599]: failover done. shutdown host
192.168.56.21(5432)
Feb  9 10:44:59 dbbalancer1 pgpool[6069]: pcp child process received
restart request
Feb  9 10:44:59 dbbalancer1 pgpool[2599]: PCP child 6069 exits with status
256
Feb  9 10:44:59 dbbalancer1 pgpool[2599]: fork a new PCP child pid 6186

second step i power on machine and try to recovery 0 - not ok:

Feb  9 11:17:49 dbbalancer1 pgpool[6239]: starting recovering node 0
Feb  9 11:17:49 dbbalancer1 pgpool[6239]: starting recovery command:
"SELECT pgpool_recovery('copy_base_backup', '192.168.56.21',
'/var/lib/postgresql/9.1/main')"
Feb  9 11:18:25 dbbalancer1 pgpool[6239]: 1st stage is done
Feb  9 11:18:26 dbbalancer1 pgpool[6239]: check_postmaster_started: try to
connect to postmaster on hostname:192.168.56.21 database:postgres
user:postgres (retry 0 times)
Feb  9 11:18:26 dbbalancer1 pgpool[6239]: check_postmaster_started: failed
to connect to postmaster on hostname:192.168.56.21 database:postgres
user:postgres
Feb  9 11:18:29 dbbalancer1 pgpool[6239]: check_postmaster_started: try to
connect to postmaster on hostname:192.168.56.21 database:postgres
user:postgres (retry 1 times)
Feb  9 11:18:29 dbbalancer1 pgpool[6239]: 0 node restarted
Feb  9 11:18:29 dbbalancer1 pgpool[6239]: send_failback_request: fail back
0 th node request from pid 6239
Feb  9 11:18:29 dbbalancer1 pgpool[2599]: starting fail back. reconnect
host 192.168.56.21(5432)
Feb  9 11:18:29 dbbalancer1 pgpool[2599]: execute command:
/usr/local/etc/pgpool.d/failback_command.sh 0 192.168.56.21 5432
/var/lib/postgresql/9.1/main 0 1 192.168.56.21 1
Feb  9 11:18:30 dbbalancer1 pgpool[2599]: Do not restart children because
we are failbacking node id 0 host192.168.56.21 port:5432 and we are in
streaming replication mode
Feb  9 11:18:30 dbbalancer1 pgpool[2599]: find_primary_node_repeatedly:
waiting for finding a primary node
Feb  9 11:18:31 dbbalancer1 pgpool[2599]: find_primary_node: primary node
id is 1
Feb  9 11:18:31 dbbalancer1 pgpool[2599]: failover: set new primary node: 1
Feb  9 11:18:31 dbbalancer1 pgpool[2599]: failover: set new master node: 0
Feb  9 11:18:31 dbbalancer1 pgpool[6193]: worker process received restart
request
Feb  9 11:18:31 dbbalancer1 pgpool[2599]: failback done. reconnect host
192.168.56.21(5432)
Feb  9 11:18:31 dbbalancer1 pgpool[6239]: recovery done

In script /usr/local/etc/pgpool.d/failback_command.sh 0 192.168.56.21 5432
/var/lib/postgresql/9.1/main 0 1 192.168.56.21 1, i see the same ip address
why it's appearance when i failover i see a message "failover: set new
master node: 1" when i back all ok.

My pgpool.conf
listen_addresses = '*'
port = 9999
socket_dir = '/tmp'
pcp_port = 9898
pcp_socket_dir = '/tmp'
backend_hostname0 = '192.168.56.21'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.1/main'
backend_hostname1 = '192.168.56.22'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.1/main'
enable_pool_hba = off
authentication_timeout = 60
ssl = off
num_init_children = 25
max_pool = 4
child_life_time = 300
child_max_connections = 100
connection_life_time = 0
client_idle_limit = 300
log_destination = 'syslog'
print_timestamp = on
log_connections = off
log_hostname = on
log_statement = on
log_per_node_statement = on
log_standby_delay = 'always'
syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'
debug_level = 1
pid_file_name = '/var/run/pgpool/pgpool.pid'
logdir = '/tmp'
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off
insert_lock = on
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 10
sr_check_user = 'sr_check'
sr_check_password = 'sr_check1'
delay_threshold = 100
follow_master_command = '/usr/local/etc/pgpool.d/follow_master_command.sh
%d %h %p %D %m %H %M %P'
parallel_mode = off
enable_query_cache = off
pgpool2_hostname = ''
system_db_hostname  = 'localhost'
system_db_port = 5432
system_db_dbname = 'pgpool'
system_db_schema = 'pgpool_catalog'
system_db_user = 'pgpool'
system_db_password = ''
health_check_period = 5
health_check_timeout = 20
health_check_user = 'health_check'
health_check_password = 'health_check1'
failover_command = '/usr/local/etc/pgpool.d/failover.sh %d %h %p %D %m %M
%H %P'
failback_command = '/usr/local/etc/pgpool.d/failback_command.sh %d %h %p %D
%m %M %H %P'
fail_over_on_backend_error = on
recovery_user = 'postgres'
recovery_password = 'postgres'
recovery_1st_stage_command = 'copy_base_backup'
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
relcache_expire = 0




-- 
С уважением Селявка Евгений
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120209/b3ae9871/attachment.html>


More information about the pgpool-general mailing list