[pgpool-general: 3498] Re: pgpool 3.3.1 - unexplained failover without health-check retries

Tatsuo Ishii ishii at postgresql.org
Fri Mar 6 11:16:39 JST 2015


pgpool-II 3.3.1 is pretty old. Please update to the latest version
(3.3.5 at this moment).  If you still have the problem, please let us
know.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hello Pgpool support,
> I am using Pgpool 3.3.1 in our production database, with master-slave feature + health check.
> Today and not for the first time, we encountered a situation in which Pgpool "decided" to do a failover after it has lost connectivity to the master node, without the retry logic that should be when using health-check mode.
> Here is the Pgpool log from the failure time:
> Mar  4 16:35:06 pgpool pgpool[22795]: DB node id: 0 backend pid: 44867 statement: C message
> Mar  4 16:35:06 pgpool pgpool[22795]: DB node id: 0 backend pid: 44867 statement: B message
> Mar  4 16:35:06 pgpool pgpool[22795]: DB node id: 1 backend pid: 26169 statement: B message
> Mar  4 16:35:06 pgpool pgpool[22795]: DB node id: 0 backend pid: 44867 statement: Execute: ROLLBACK
> Mar  4 16:35:06 pgpool pgpool[22795]: DB node id: 1 backend pid: 26169 statement: Execute: ROLLBACK
> Mar  4 16:35:06 pgpool pgpool[22795]: DB node id: 0 backend pid: 44867 statement: Parse: SELECT 1
> Mar  4 16:35:06 pgpool pgpool[22795]: DB node id: 0 backend pid: 44867 statement: B message
> Mar  4 16:35:06 pgpool pgpool[22795]: DB node id: 0 backend pid: 44867 statement: D message
> Mar  4 16:35:06 pgpool pgpool[22795]: DB node id: 0 backend pid: 44867 statement: Execute: SELECT 1
> Mar  4 16:35:09 pgpool pgpool[2306]: connect_inet_domain_socket: gethostbyname() failed: Unknown host host: pnmpg3.sj.peer39.com
> Mar  4 16:35:09 pgpool pgpool[2306]: connection to pnmpg3.sj.peer39.com(5432) failed
> Mar  4 16:35:09 pgpool pgpool[2306]: new_connection: create_cp() failed
> Mar  4 16:35:09 pgpool pgpool[2306]: degenerate_backend_set: 0 fail over request from pid 2306
> Mar  4 16:35:09 pgpool pgpool[32118]: starting degeneration. shutdown host pnmpg3.sj.peer39.com(5432)
> Mar  4 16:35:09 pgpool pgpool[32118]: Restart all children
> Mar  4 16:35:09 pgpool pgpool[32118]: execute command: /etc/pgpool-II/failover.sh 0 "pnmpg3.sj.peer39.com" 5432 /var/lib/pgsql/9.2/data 1 0 "pnmpg4.sj.peer39.com" 0
> Mar  4 16:35:19 pgpool pgpool[32118]: find_primary_node_repeatedly: waiting for finding a primary node
> Mar  4 16:35:35 pgpool pgpool[32118]: failover: set new primary node: -1
> Mar  4 16:35:35 pgpool pgpool[32118]: failover: set new master node: 1
> Mar  4 16:35:35 pgpool pgpool[32233]: worker process received restart request
> Mar  4 16:35:35 pgpool pgpool[32118]: failover done. shutdown host pnmpg3.sj.peer39.com(5432)
> 
> Is this a normal behavior or is it a bug?? If it is normal, Is there a way to modify it?
> 
> Pgpool enabled features:
> 
> [ mode ] Master Slave mode
> 
> [ healthcheck ] every 40 seconds / retry upto 3 counts
> 
> Partial pgpool.conf:
> #------------------------------------------------------------------------------
> # POOLS
> #------------------------------------------------------------------------------
> # - Pool size -
> num_init_children = 50
> max_pool = 1
> # - Life time -
> child_life_time = 50
> child_max_connections = 0
> connection_life_time = 1800
> client_idle_limit = 0
> #------------------------------------------------------------------------------
> # MASTER/SLAVE MODE
> #------------------------------------------------------------------------------
> master_slave_mode = on
> master_slave_sub_mode = 'stream'
> # - Streaming -
> sr_check_period = 30
> sr_check_user = 'username'
> sr_check_password = 'password'
> delay_threshold = 0
> # - Special commands -
> follow_master_command = ''
> #------------------------------------------------------------------------------
> # HEALTH CHECK
> #------------------------------------------------------------------------------
> health_check_period = 40
> health_check_timeout = 10
> health_check_user = 'username'
> health_check_password = 'password'
> health_check_max_retries = 3
> health_check_retry_delay = 20
> #------------------------------------------------------------------------------
> # FAILOVER AND FAILBACK
> #------------------------------------------------------------------------------
> failover_command = '/etc/pgpool-II/failover.sh %d "%h" %p %D %m %M "%H" %P'
> failback_command = ''
> fail_over_on_backend_error = on
> search_primary_node_timeout = 10
> #------------------------------------------------------------------------------
> # ONLINE RECOVERY
> #------------------------------------------------------------------------------
> recovery_user = ''
> recovery_password = ''
> recovery_1st_stage_command = 'basebackup.sh'
> recovery_2nd_stage_command = ''
> recovery_timeout = 90
> client_idle_limit_in_recovery = 0
> 
> Thanks
> 
> Boaz Goldstein
> DBA, Deployment DBA Team
> boaz.goldstein at sizmek.com<mailto:boaz.goldstein at sizmek.com>
> M +972.524.695731
> T +972.9.778.2910
> Israel
> 


More information about the pgpool-general mailing list