[pgpool-general: 4055] Re: temporarily losing connexion when slave goes down in master-slave mode

Thomas SIMON tsimon at neteven.com
Thu Sep 10 21:59:32 JST 2015


I must specify that it occurs only when load_balancing is enabled.

Thomas

Le 10/09/2015 12:29, Thomas SIMON a écrit :
> Hi All,
>
> I want to configure pgpool in master slave mode with 2 pgpool and 
> virtual IP.
> Everything works fine (master-slave, authentication, failover if 
> master goes down), except when i'm trying to simulate slave going down.
>
>
> When I shutdown the 2nd host (slave postgresql/pgpool node), I can't 
> do any requests on master pgpool while master hasn't restarted 
> childrens (see below). I don't understand why this happends, because 
> neither master pgpool nor master postgres are impacted, requetss 
> should always be send on master postgresql which is fine.
>
> This only happends when I shutdown host, or unplug cable. It I stop 
> slave pgpool and slave postgres, everything is ok (slave node is 
> removed of pool by master pgpool, and I have no time where I can't do 
> requests)
>
>
> I do this commands :
>
> connexions to IP failover
>
> [12:08:52]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
> -c "show pool_nodes;"
>  node_id |   hostname   | port | status | lb_weight |  role
> ---------+--------------+------+--------+-----------+---------
>  0       | localhost    | 5432 | 2      | 0.090909  | primary
>  1       | pg2.tstvrack | 5432 | 2      | 0.909091  | standby
>
> *** Shutdown pg2.lan ***
>
> [12:09:22]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
> -c "show pool_nodes;"
> psql: erreur SYSCALL SSL : EOF détecté
> [12:09:31]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
> -c "show pool_nodes;"
> psql: erreur SYSCALL SSL : EOF détecté
> [12:09:48]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
> -c "show pool_nodes;"
> psql: erreur SYSCALL SSL : EOF détecté
> [12:10:11]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
> -c "show pool_nodes;"
>  node_id |   hostname   | port | status | lb_weight |  role
> ---------+--------------+------+--------+-----------+---------
>  0       | localhost    | 5432 | 2      | 0.090909  | primary
>  1       | pg2.tstvrack | 5432 | 3      | 0.909091  | standby
> (2 lignes)
>
>
> At same time, I have following logs in pgpool maser node :
>
> 2015-09-10T12:09:36.013230+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:09:39.013361+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:09:42.013493+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:09:42.149978+02:00 pg1 pgpool[5423]: connection 
> received: host=1.2.3.4 port=45070
> 2015-09-10T12:09:42.209415+02:00 pg1 pgpool[5423]: connection closed. 
> retry to create new connection pool.
> 2015-09-10T12:09:45.013617+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:09:45.042853+02:00 pg1 pgpool[5359]: 
> connect_inet_domain_socket: getsockopt() detected error: No route to host
> 2015-09-10T12:09:45.042868+02:00 pg1 pgpool[5359]: 
> make_persistent_db_connection: connection to pg2.lan(5432) failed
> 2015-09-10T12:09:45.042871+02:00 pg1 pgpool[5359]: health check 
> failed. 1 th host pg2.lan at port 5432 is down
> 2015-09-10T12:09:45.042874+02:00 pg1 pgpool[5359]: health check retry 
> sleep time: 10 second(s)
> 2015-09-10T12:09:48.013745+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:09:48.211788+02:00 pg1 pgpool[5423]: 
> connect_inet_domain_socket: getsockopt() detected error: No route to host
> 2015-09-10T12:09:48.211801+02:00 pg1 pgpool[5423]: connection to 
> pg2.lan(5432) failed
> 2015-09-10T12:09:48.211805+02:00 pg1 pgpool[5423]: new_connection: 
> create_cp() failed
> 2015-09-10T12:09:48.211808+02:00 pg1 pgpool[5423]: new_connection: do 
> not failover because fail_over_on_backend_error is off
> 2015-09-10T12:09:51.013868+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:09:54.013997+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:09:57.014126+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:09:58.099882+02:00 pg1 pgpool[5359]: 
> connect_inet_domain_socket: getsockopt() detected error: No route to host
> 2015-09-10T12:09:58.099899+02:00 pg1 pgpool[5359]: 
> make_persistent_db_connection: connection to pg2.lan(5432) failed
> 2015-09-10T12:09:58.099903+02:00 pg1 pgpool[5359]: health check 
> failed. 1 th host pg2.lan at port 5432 is down
> 2015-09-10T12:09:58.099906+02:00 pg1 pgpool[5359]: health check retry 
> sleep time: 10 second(s)
> 2015-09-10T12:10:00.014258+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:10:03.014381+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:10:06.014502+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:10:09.014617+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:10:11.115906+02:00 pg1 pgpool[5359]: 
> connect_inet_domain_socket: getsockopt() detected error: No route to host
> 2015-09-10T12:10:11.115921+02:00 pg1 pgpool[5359]: 
> make_persistent_db_connection: connection to pg2.lan(5432) failed
> 2015-09-10T12:10:11.115925+02:00 pg1 pgpool[5359]: health check 
> failed. 1 th host pg2.lan at port 5432 is down
> 2015-09-10T12:10:11.115928+02:00 pg1 pgpool[5359]: set 1 th backend 
> down status
> 2015-09-10T12:10:11.115931+02:00 pg1 pgpool[5359]: wd_start_interlock: 
> start interlocking
> 2015-09-10T12:10:11.115934+02:00 pg1 pgpool[5359]: 
> wd_assume_lock_holder: become a new lock holder
> 2015-09-10T12:10:11.374369+02:00 pg1 pgpool[5529]: connection 
> received: host=1.2.3.4 port=45077
> 2015-09-10T12:10:11.616811+02:00 pg1 pgpool[5359]: starting 
> degeneration. shutdown host pg2.lan(5432)
> 2015-09-10T12:10:11.616828+02:00 pg1 pgpool[5359]: Restart all children
> 2015-09-10T12:10:11.616831+02:00 pg1 pgpool[5359]: execute command: 
> /apps/scripts/postgres/failover.sh 1 0 localhost 
> /data/postgresql/9.3/main
> 2015-09-10T12:10:11.630510+02:00 pg1 pgpool: + FALLING_NODE=1
> 2015-09-10T12:10:11.630639+02:00 pg1 pgpool: + ACTUAL_PRIMARY_NODE=0
> 2015-09-10T12:10:11.630721+02:00 pg1 pgpool: + NEW_PRIMARY=localhost
> 2015-09-10T12:10:11.630864+02:00 pg1 pgpool: + 
> PGDATA=/data/postgresql/9.3/main
> 2015-09-10T12:10:11.630978+02:00 pg1 pgpool: + '[' 4 '!=' 4 ']'
> 2015-09-10T12:10:11.631131+02:00 pg1 pgpool: + '[' 1 = 0 ']'
> 2015-09-10T12:10:11.631209+02:00 pg1 pgpool: + exit 0
> 2015-09-10T12:10:11.631454+02:00 pg1 pgpool[5359]: wd_end_interlock: 
> end interlocking
> 2015-09-10T12:10:12.014741+02:00 pg1 pgpool[5403]: 
> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
> 2015-09-10T12:10:12.136452+02:00 pg1 pgpool[5359]: failover: set new 
> primary node: 0
> 2015-09-10T12:10:12.136470+02:00 pg1 pgpool[5359]: failover: set new 
> master node: 0
> 2015-09-10T12:10:12.136473+02:00 pg1 pgpool[5359]: failover done. 
> shutdown host pg2.lan(5432)
> 2015-09-10T12:10:12.136478+02:00 pg1 pgpool[5487]: worker process 
> received restart request
> 2015-09-10T12:10:12.136912+02:00 pg1 pgpool[5544]: do_child: failback 
> event found. restart myself.
> 2015-09-10T12:10:12.138083+02:00 pg1 pgpool[5546]: do_child: failback 
> event found. restart myself.
>
>
>
>
> Seems the problem comes from 'No route to host'.
>
>
> Below my pgpool parameters
>
> name : health_check_period
> value: 10
> desc : health check period
>
> name : health_check_timeout
> value: 5
> desc : health check timeout
>
> name : health_check_user
> value: pgpool_check_replication
> desc : health check user
>
> name : health_check_max_retries
> value: 3
> desc : health check max retries
>
> name : health_check_retry_delay
> value: 10
> desc : health check retry delay
>
>
>
> Where am I wrong ?
> Thanks
>
> Thomas
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general



More information about the pgpool-general mailing list