[pgpool-general: 4056] Re: temporarily losing connexion when slave goes down in master-slave mode

Thomas SIMON tsimon at neteven.com
Thu Sep 10 23:54:05 JST 2015


I have same behavior juste by stopping postgresql slave node

2015-09-10T16:52:06.372504+02:00 pg1 pgpool[3809]: 
connect_inet_domain_socket: getsockopt() detected error: Connection refused
2015-09-10T16:52:06.372521+02:00 pg1 pgpool[3809]: 
make_persistent_db_connection: connection to pg2.lan(5432) failed
2015-09-10T16:52:06.372524+02:00 pg1 pgpool[3809]: health check failed. 
1 th host pg2.lan at port 5432 is down
2015-09-10T16:52:06.372527+02:00 pg1 pgpool[3809]: health check retry 
sleep time: 10 second(s)

Thomas

Le 10/09/2015 14:59, Thomas SIMON a écrit :
> I must specify that it occurs only when load_balancing is enabled.
>
> Thomas
>
> Le 10/09/2015 12:29, Thomas SIMON a écrit :
>> Hi All,
>>
>> I want to configure pgpool in master slave mode with 2 pgpool and 
>> virtual IP.
>> Everything works fine (master-slave, authentication, failover if 
>> master goes down), except when i'm trying to simulate slave going down.
>>
>>
>> When I shutdown the 2nd host (slave postgresql/pgpool node), I can't 
>> do any requests on master pgpool while master hasn't restarted 
>> childrens (see below). I don't understand why this happends, because 
>> neither master pgpool nor master postgres are impacted, requetss 
>> should always be send on master postgresql which is fine.
>>
>> This only happends when I shutdown host, or unplug cable. It I stop 
>> slave pgpool and slave postgres, everything is ok (slave node is 
>> removed of pool by master pgpool, and I have no time where I can't do 
>> requests)
>>
>>
>> I do this commands :
>>
>> connexions to IP failover
>>
>> [12:08:52]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
>> -c "show pool_nodes;"
>>  node_id |   hostname   | port | status | lb_weight |  role
>> ---------+--------------+------+--------+-----------+---------
>>  0       | localhost    | 5432 | 2      | 0.090909  | primary
>>  1       | pg2.tstvrack | 5432 | 2      | 0.909091  | standby
>>
>> *** Shutdown pg2.lan ***
>>
>> [12:09:22]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
>> -c "show pool_nodes;"
>> psql: erreur SYSCALL SSL : EOF détecté
>> [12:09:31]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
>> -c "show pool_nodes;"
>> psql: erreur SYSCALL SSL : EOF détecté
>> [12:09:48]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
>> -c "show pool_nodes;"
>> psql: erreur SYSCALL SSL : EOF détecté
>> [12:10:11]toms at tomdesk:~$ psql -p 9999 -h ipv.lan intranet -U my_user 
>> -c "show pool_nodes;"
>>  node_id |   hostname   | port | status | lb_weight |  role
>> ---------+--------------+------+--------+-----------+---------
>>  0       | localhost    | 5432 | 2      | 0.090909  | primary
>>  1       | pg2.tstvrack | 5432 | 3      | 0.909091  | standby
>> (2 lignes)
>>
>>
>> At same time, I have following logs in pgpool maser node :
>>
>> 2015-09-10T12:09:36.013230+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:09:39.013361+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:09:42.013493+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:09:42.149978+02:00 pg1 pgpool[5423]: connection 
>> received: host=1.2.3.4 port=45070
>> 2015-09-10T12:09:42.209415+02:00 pg1 pgpool[5423]: connection closed. 
>> retry to create new connection pool.
>> 2015-09-10T12:09:45.013617+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:09:45.042853+02:00 pg1 pgpool[5359]: 
>> connect_inet_domain_socket: getsockopt() detected error: No route to 
>> host
>> 2015-09-10T12:09:45.042868+02:00 pg1 pgpool[5359]: 
>> make_persistent_db_connection: connection to pg2.lan(5432) failed
>> 2015-09-10T12:09:45.042871+02:00 pg1 pgpool[5359]: health check 
>> failed. 1 th host pg2.lan at port 5432 is down
>> 2015-09-10T12:09:45.042874+02:00 pg1 pgpool[5359]: health check retry 
>> sleep time: 10 second(s)
>> 2015-09-10T12:09:48.013745+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:09:48.211788+02:00 pg1 pgpool[5423]: 
>> connect_inet_domain_socket: getsockopt() detected error: No route to 
>> host
>> 2015-09-10T12:09:48.211801+02:00 pg1 pgpool[5423]: connection to 
>> pg2.lan(5432) failed
>> 2015-09-10T12:09:48.211805+02:00 pg1 pgpool[5423]: new_connection: 
>> create_cp() failed
>> 2015-09-10T12:09:48.211808+02:00 pg1 pgpool[5423]: new_connection: do 
>> not failover because fail_over_on_backend_error is off
>> 2015-09-10T12:09:51.013868+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:09:54.013997+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:09:57.014126+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:09:58.099882+02:00 pg1 pgpool[5359]: 
>> connect_inet_domain_socket: getsockopt() detected error: No route to 
>> host
>> 2015-09-10T12:09:58.099899+02:00 pg1 pgpool[5359]: 
>> make_persistent_db_connection: connection to pg2.lan(5432) failed
>> 2015-09-10T12:09:58.099903+02:00 pg1 pgpool[5359]: health check 
>> failed. 1 th host pg2.lan at port 5432 is down
>> 2015-09-10T12:09:58.099906+02:00 pg1 pgpool[5359]: health check retry 
>> sleep time: 10 second(s)
>> 2015-09-10T12:10:00.014258+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:10:03.014381+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:10:06.014502+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:10:09.014617+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:10:11.115906+02:00 pg1 pgpool[5359]: 
>> connect_inet_domain_socket: getsockopt() detected error: No route to 
>> host
>> 2015-09-10T12:10:11.115921+02:00 pg1 pgpool[5359]: 
>> make_persistent_db_connection: connection to pg2.lan(5432) failed
>> 2015-09-10T12:10:11.115925+02:00 pg1 pgpool[5359]: health check 
>> failed. 1 th host pg2.lan at port 5432 is down
>> 2015-09-10T12:10:11.115928+02:00 pg1 pgpool[5359]: set 1 th backend 
>> down status
>> 2015-09-10T12:10:11.115931+02:00 pg1 pgpool[5359]: 
>> wd_start_interlock: start interlocking
>> 2015-09-10T12:10:11.115934+02:00 pg1 pgpool[5359]: 
>> wd_assume_lock_holder: become a new lock holder
>> 2015-09-10T12:10:11.374369+02:00 pg1 pgpool[5529]: connection 
>> received: host=1.2.3.4 port=45077
>> 2015-09-10T12:10:11.616811+02:00 pg1 pgpool[5359]: starting 
>> degeneration. shutdown host pg2.lan(5432)
>> 2015-09-10T12:10:11.616828+02:00 pg1 pgpool[5359]: Restart all children
>> 2015-09-10T12:10:11.616831+02:00 pg1 pgpool[5359]: execute command: 
>> /apps/scripts/postgres/failover.sh 1 0 localhost 
>> /data/postgresql/9.3/main
>> 2015-09-10T12:10:11.630510+02:00 pg1 pgpool: + FALLING_NODE=1
>> 2015-09-10T12:10:11.630639+02:00 pg1 pgpool: + ACTUAL_PRIMARY_NODE=0
>> 2015-09-10T12:10:11.630721+02:00 pg1 pgpool: + NEW_PRIMARY=localhost
>> 2015-09-10T12:10:11.630864+02:00 pg1 pgpool: + 
>> PGDATA=/data/postgresql/9.3/main
>> 2015-09-10T12:10:11.630978+02:00 pg1 pgpool: + '[' 4 '!=' 4 ']'
>> 2015-09-10T12:10:11.631131+02:00 pg1 pgpool: + '[' 1 = 0 ']'
>> 2015-09-10T12:10:11.631209+02:00 pg1 pgpool: + exit 0
>> 2015-09-10T12:10:11.631454+02:00 pg1 pgpool[5359]: wd_end_interlock: 
>> end interlocking
>> 2015-09-10T12:10:12.014741+02:00 pg1 pgpool[5403]: 
>> check_pgpool_status_by_hb: pgpool 1 (pg2.lan:9999) is in down status
>> 2015-09-10T12:10:12.136452+02:00 pg1 pgpool[5359]: failover: set new 
>> primary node: 0
>> 2015-09-10T12:10:12.136470+02:00 pg1 pgpool[5359]: failover: set new 
>> master node: 0
>> 2015-09-10T12:10:12.136473+02:00 pg1 pgpool[5359]: failover done. 
>> shutdown host pg2.lan(5432)
>> 2015-09-10T12:10:12.136478+02:00 pg1 pgpool[5487]: worker process 
>> received restart request
>> 2015-09-10T12:10:12.136912+02:00 pg1 pgpool[5544]: do_child: failback 
>> event found. restart myself.
>> 2015-09-10T12:10:12.138083+02:00 pg1 pgpool[5546]: do_child: failback 
>> event found. restart myself.
>>
>>
>>
>>
>> Seems the problem comes from 'No route to host'.
>>
>>
>> Below my pgpool parameters
>>
>> name : health_check_period
>> value: 10
>> desc : health check period
>>
>> name : health_check_timeout
>> value: 5
>> desc : health check timeout
>>
>> name : health_check_user
>> value: pgpool_check_replication
>> desc : health check user
>>
>> name : health_check_max_retries
>> value: 3
>> desc : health check max retries
>>
>> name : health_check_retry_delay
>> value: 10
>> desc : health check retry delay
>>
>>
>>
>> Where am I wrong ?
>> Thanks
>>
>> Thomas
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general



More information about the pgpool-general mailing list