[pgpool-general: 4054] temporarily losing connexion when slave goes down in master-slave mode
Thomas SIMON
tsimon at neteven.com
Thu Sep 10 19:29:24 JST 2015
Hi All,
I want to configure pgpool in master slave mode with 2 pgpool and
virtual IP.
Everything works fine (master-slave, authentication, failover if master
goes down), except when i'm trying to simulate slave going down.
When I shutdown the 2nd host (slave postgresql/pgpool node), I can't do
any requests on master pgpool while master hasn't restarted childrens
(see below). I don't understand why this happends, because neither
master pgpool nor master postgres are impacted, requetss should always
be send on master postgresql which is fine.
This only happends when I shutdown host, or unplug cable. It I stop
slave pgpool and slave postgres, everything is ok (slave node is removed
of pool by master pgpool, and I have no time where I can't do requests)
I do this commands :
connexions to IP failover
[12:08:52]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c
"show pool_nodes;"
node_id | hostname | port | status | lb_weight | role
---------+--------------+------+--------+-----------+---------
0 | localhost | 5432 | 2 | 0.090909 | primary
1 | pg2.tstvrack | 5432 | 2 | 0.909091 | standby
*** Shutdown pg2.lan ***
[12:09:22]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c
"show pool_nodes;"
psql: erreur SYSCALL SSL : EOF détecté
[12:09:31]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c
"show pool_nodes;"
psql: erreur SYSCALL SSL : EOF détecté
[12:09:48]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c
"show pool_nodes;"
psql: erreur SYSCALL SSL : EOF détecté
[12:10:11]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user -c
"show pool_nodes;"
node_id | hostname | port | status | lb_weight | role
---------+--------------+------+--------+-----------+---------
0 | localhost | 5432 | 2 | 0.090909 | primary
1 | pg2.tstvrack | 5432 | 3 | 0.909091 | standby
(2 lignes)
At same time, I have following logs in pgpool maser node :
2015-09-10T12:09:36.013230+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:39.013361+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:42.013493+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:42.149978+02:00 pg1 pgpool[5423]: connection received:
host=1.2.3.4 port=45070
2015-09-10T12:09:42.209415+02:00 pg1 pgpool[5423]: connection closed.
retry to create new connection pool.
2015-09-10T12:09:45.013617+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:45.042853+02:00 pg1 pgpool[5359]:
connect_inet_domain_socket: getsockopt() detected error: No route to host
2015-09-10T12:09:45.042868+02:00 pg1 pgpool[5359]:
make_persistent_db_connection: connection to pg2.lan(5432) failed
2015-09-10T12:09:45.042871+02:00 pg1 pgpool[5359]: health check failed.
1 th host pg2.lan at port 5432 is down
2015-09-10T12:09:45.042874+02:00 pg1 pgpool[5359]: health check retry
sleep time: 10 second(s)
2015-09-10T12:09:48.013745+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:48.211788+02:00 pg1 pgpool[5423]:
connect_inet_domain_socket: getsockopt() detected error: No route to host
2015-09-10T12:09:48.211801+02:00 pg1 pgpool[5423]: connection to
pg2.lan(5432) failed
2015-09-10T12:09:48.211805+02:00 pg1 pgpool[5423]: new_connection:
create_cp() failed
2015-09-10T12:09:48.211808+02:00 pg1 pgpool[5423]: new_connection: do
not failover because fail_over_on_backend_error is off
2015-09-10T12:09:51.013868+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:54.013997+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:57.014126+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:09:58.099882+02:00 pg1 pgpool[5359]:
connect_inet_domain_socket: getsockopt() detected error: No route to host
2015-09-10T12:09:58.099899+02:00 pg1 pgpool[5359]:
make_persistent_db_connection: connection to pg2.lan(5432) failed
2015-09-10T12:09:58.099903+02:00 pg1 pgpool[5359]: health check failed.
1 th host pg2.lan at port 5432 is down
2015-09-10T12:09:58.099906+02:00 pg1 pgpool[5359]: health check retry
sleep time: 10 second(s)
2015-09-10T12:10:00.014258+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:03.014381+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:06.014502+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:09.014617+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:11.115906+02:00 pg1 pgpool[5359]:
connect_inet_domain_socket: getsockopt() detected error: No route to host
2015-09-10T12:10:11.115921+02:00 pg1 pgpool[5359]:
make_persistent_db_connection: connection to pg2.lan(5432) failed
2015-09-10T12:10:11.115925+02:00 pg1 pgpool[5359]: health check failed.
1 th host pg2.lan at port 5432 is down
2015-09-10T12:10:11.115928+02:00 pg1 pgpool[5359]: set 1 th backend down
status
2015-09-10T12:10:11.115931+02:00 pg1 pgpool[5359]: wd_start_interlock:
start interlocking
2015-09-10T12:10:11.115934+02:00 pg1 pgpool[5359]:
wd_assume_lock_holder: become a new lock holder
2015-09-10T12:10:11.374369+02:00 pg1 pgpool[5529]: connection received:
host=1.2.3.4 port=45077
2015-09-10T12:10:11.616811+02:00 pg1 pgpool[5359]: starting
degeneration. shutdown host pg2.lan(5432)
2015-09-10T12:10:11.616828+02:00 pg1 pgpool[5359]: Restart all children
2015-09-10T12:10:11.616831+02:00 pg1 pgpool[5359]: execute command:
/apps/scripts/postgres/failover.sh 1 0 localhost /data/postgresql/9.3/main
2015-09-10T12:10:11.630510+02:00 pg1 pgpool: + FALLING_NODE=1
2015-09-10T12:10:11.630639+02:00 pg1 pgpool: + ACTUAL_PRIMARY_NODE=0
2015-09-10T12:10:11.630721+02:00 pg1 pgpool: + NEW_PRIMARY=localhost
2015-09-10T12:10:11.630864+02:00 pg1 pgpool: +
PGDATA=/data/postgresql/9.3/main
2015-09-10T12:10:11.630978+02:00 pg1 pgpool: + '[' 4 '!=' 4 ']'
2015-09-10T12:10:11.631131+02:00 pg1 pgpool: + '[' 1 = 0 ']'
2015-09-10T12:10:11.631209+02:00 pg1 pgpool: + exit 0
2015-09-10T12:10:11.631454+02:00 pg1 pgpool[5359]: wd_end_interlock: end
interlocking
2015-09-10T12:10:12.014741+02:00 pg1 pgpool[5403]:
check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
2015-09-10T12:10:12.136452+02:00 pg1 pgpool[5359]: failover: set new
primary node: 0
2015-09-10T12:10:12.136470+02:00 pg1 pgpool[5359]: failover: set new
master node: 0
2015-09-10T12:10:12.136473+02:00 pg1 pgpool[5359]: failover done.
shutdown host pg2.lan(5432)
2015-09-10T12:10:12.136478+02:00 pg1 pgpool[5487]: worker process
received restart request
2015-09-10T12:10:12.136912+02:00 pg1 pgpool[5544]: do_child: failback
event found. restart myself.
2015-09-10T12:10:12.138083+02:00 pg1 pgpool[5546]: do_child: failback
event found. restart myself.
Seems the problem comes from 'No route to host'.
Below my pgpool parameters
name : health_check_period
value: 10
desc : health check period
name : health_check_timeout
value: 5
desc : health check timeout
name : health_check_user
value: pgpool_check_replication
desc : health check user
name : health_check_max_retries
value: 3
desc : health check max retries
name : health_check_retry_delay
value: 10
desc : health check retry delay
Where am I wrong ?
Thanks
Thomas
More information about the pgpool-general
mailing list