[pgpool-general: 4054] temporarily losing connexion when slave goes down in master-slave mode

Thomas SIMON tsimon at neteven.com
Thu Sep 10 19:29:24 JST 2015


Hi All,

I want to configure pgpool in master slave mode with 2 pgpool and 
virtual IP.
Everything works fine (master-slave, authentication, failover if master 
goes down), except when i'm trying to simulate slave going down.


When I shutdown the 2nd host (slave postgresql/pgpool node), I can't do 
any requests on master pgpool while master hasn't restarted childrens 
(see below). I don't understand why this happends, because neither 
master pgpool nor master postgres are impacted, requetss should always 
be send on master postgresql which is fine.

This only happends when I shutdown host, or unplug cable. It I stop 
slave pgpool and slave postgres, everything is ok (slave node is removed 
of pool by master pgpool, and I have no time where I can't do requests)


I do this commands :

connexions to IP failover

[12:08:52]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c 
"show pool_nodes;"
  node_id |   hostname   | port | status | lb_weight |  role
---------+--------------+------+--------+-----------+---------
  0       | localhost    | 5432 | 2      | 0.090909  | primary
  1       | pg2.tstvrack | 5432 | 2      | 0.909091  | standby

*** Shutdown pg2.lan ***

[12:09:22]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c 
"show pool_nodes;"
psql: erreur SYSCALL SSL : EOF détecté
[12:09:31]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c 
"show pool_nodes;"
psql: erreur SYSCALL SSL : EOF détecté
[12:09:48]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c 
"show pool_nodes;"
psql: erreur SYSCALL SSL : EOF détecté
[12:10:11]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c 
"show pool_nodes;"
  node_id |   hostname   | port | status | lb_weight |  role
---------+--------------+------+--------+-----------+---------
  0       | localhost    | 5432 | 2      | 0.090909  | primary
  1       | pg2.tstvrack | 5432 | 3      | 0.909091  | standby
(2 lignes)


At same time, I have following logs in pgpool maser node :

2015-09-10T12:09:36.013230+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:39.013361+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:42.013493+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:42.149978+02:00 pg1 pgpool[5423]: connection received: 
host=1.2.3.4 port=45070
2015-09-10T12:09:42.209415+02:00 pg1 pgpool[5423]: connection closed. 
retry to create new connection pool.
2015-09-10T12:09:45.013617+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:45.042853+02:00 pg1 pgpool[5359]: 
connect_inet_domain_socket: getsockopt() detected error: No route to host
2015-09-10T12:09:45.042868+02:00 pg1 pgpool[5359]: 
make_persistent_db_connection: connection to pg2.lan(5432) failed
2015-09-10T12:09:45.042871+02:00 pg1 pgpool[5359]: health check failed. 
1 th host pg2.lan at port 5432 is down
2015-09-10T12:09:45.042874+02:00 pg1 pgpool[5359]: health check retry 
sleep time: 10 second(s)
2015-09-10T12:09:48.013745+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:48.211788+02:00 pg1 pgpool[5423]: 
connect_inet_domain_socket: getsockopt() detected error: No route to host
2015-09-10T12:09:48.211801+02:00 pg1 pgpool[5423]: connection to 
pg2.lan(5432) failed
2015-09-10T12:09:48.211805+02:00 pg1 pgpool[5423]: new_connection: 
create_cp() failed
2015-09-10T12:09:48.211808+02:00 pg1 pgpool[5423]: new_connection: do 
not failover because fail_over_on_backend_error is off
2015-09-10T12:09:51.013868+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:54.013997+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:57.014126+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:58.099882+02:00 pg1 pgpool[5359]: 
connect_inet_domain_socket: getsockopt() detected error: No route to host
2015-09-10T12:09:58.099899+02:00 pg1 pgpool[5359]: 
make_persistent_db_connection: connection to pg2.lan(5432) failed
2015-09-10T12:09:58.099903+02:00 pg1 pgpool[5359]: health check failed. 
1 th host pg2.lan at port 5432 is down
2015-09-10T12:09:58.099906+02:00 pg1 pgpool[5359]: health check retry 
sleep time: 10 second(s)
2015-09-10T12:10:00.014258+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:03.014381+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:06.014502+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:09.014617+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:11.115906+02:00 pg1 pgpool[5359]: 
connect_inet_domain_socket: getsockopt() detected error: No route to host
2015-09-10T12:10:11.115921+02:00 pg1 pgpool[5359]: 
make_persistent_db_connection: connection to pg2.lan(5432) failed
2015-09-10T12:10:11.115925+02:00 pg1 pgpool[5359]: health check failed. 
1 th host pg2.lan at port 5432 is down
2015-09-10T12:10:11.115928+02:00 pg1 pgpool[5359]: set 1 th backend down 
status
2015-09-10T12:10:11.115931+02:00 pg1 pgpool[5359]: wd_start_interlock: 
start interlocking
2015-09-10T12:10:11.115934+02:00 pg1 pgpool[5359]: 
wd_assume_lock_holder: become a new lock holder
2015-09-10T12:10:11.374369+02:00 pg1 pgpool[5529]: connection received: 
host=1.2.3.4 port=45077
2015-09-10T12:10:11.616811+02:00 pg1 pgpool[5359]: starting 
degeneration. shutdown host pg2.lan(5432)
2015-09-10T12:10:11.616828+02:00 pg1 pgpool[5359]: Restart all children
2015-09-10T12:10:11.616831+02:00 pg1 pgpool[5359]: execute command: 
/apps/scripts/postgres/failover.sh 1 0 localhost /data/postgresql/9.3/main
2015-09-10T12:10:11.630510+02:00 pg1 pgpool: + FALLING_NODE=1
2015-09-10T12:10:11.630639+02:00 pg1 pgpool: + ACTUAL_PRIMARY_NODE=0
2015-09-10T12:10:11.630721+02:00 pg1 pgpool: + NEW_PRIMARY=localhost
2015-09-10T12:10:11.630864+02:00 pg1 pgpool: + 
PGDATA=/data/postgresql/9.3/main
2015-09-10T12:10:11.630978+02:00 pg1 pgpool: + '[' 4 '!=' 4 ']'
2015-09-10T12:10:11.631131+02:00 pg1 pgpool: + '[' 1 = 0 ']'
2015-09-10T12:10:11.631209+02:00 pg1 pgpool: + exit 0
2015-09-10T12:10:11.631454+02:00 pg1 pgpool[5359]: wd_end_interlock: end 
interlocking
2015-09-10T12:10:12.014741+02:00 pg1 pgpool[5403]: 
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:12.136452+02:00 pg1 pgpool[5359]: failover: set new 
primary node: 0
2015-09-10T12:10:12.136470+02:00 pg1 pgpool[5359]: failover: set new 
master node: 0
2015-09-10T12:10:12.136473+02:00 pg1 pgpool[5359]: failover done. 
shutdown host pg2.lan(5432)
2015-09-10T12:10:12.136478+02:00 pg1 pgpool[5487]: worker process 
received restart request
2015-09-10T12:10:12.136912+02:00 pg1 pgpool[5544]: do_child: failback 
event found. restart myself.
2015-09-10T12:10:12.138083+02:00 pg1 pgpool[5546]: do_child: failback 
event found. restart myself.




Seems the problem comes from 'No route to host'.


Below my pgpool parameters

name : health_check_period
value: 10
desc : health check period

name : health_check_timeout
value: 5
desc : health check timeout

name : health_check_user
value: pgpool_check_replication
desc : health check user

name : health_check_max_retries
value: 3
desc : health check max retries

name : health_check_retry_delay
value: 10
desc : health check retry delay



Where am I wrong ?
Thanks

Thomas



More information about the pgpool-general mailing list