View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000397 | Pgpool-II | Bug | public | 2018-05-11 18:41 | 2019-01-30 10:05 |
| Reporter | jjsantam | Assigned To | pengbo | ||
| Priority | normal | Severity | crash | Reproducibility | random |
| Status | assigned | Resolution | open | ||
| Platform | x86 | OS | CentOS Linux | OS Version | 7.3.1611 |
| Product Version | 3.7.3 | ||||
| Summary | 0000397: Watchdog lifecheck seldom failure | ||||
| Description | The watchdog life check is not stable, it works most of the time but occasionally it fails shutingdown a pgpool. The pgpool.conf: listen_addresses = '*' port = 9999 socket_dir = '/tmp' listen_backlog_multiplier = 2 serialize_accept = on pcp_listen_addresses = '*' pcp_port = 9898 pcp_socket_dir = '/tmp' backend_hostname0 = 'server1' backend_port0 = 5432 backend_weight0 = 1 backend_data_directory0 = '/home/postgres/data' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = 'server2' backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = '/home/postgres/data' backend_flag1 = 'ALLOW_TO_FAILOVER' enable_pool_hba = on pool_passwd = 'pool_passwd' authentication_timeout = 60 ssl = off num_init_children = 100 max_pool = 2 child_life_time = 0 child_max_connections = 0 connection_life_time = 0 client_idle_limit = 0 log_destination = 'syslog' log_line_prefix = '%t: pid %p: ' # printf-style string to output at beginning of each log line. log_connections = off log_hostname = off log_statement = off log_per_node_statement = off log_standby_delay = 'none' syslog_facility = 'LOCAL1' syslog_ident = 'pgpool' pid_file_name = '/var/run/pgpool-II-10/pgpool.pid' logdir = '/var/log/pgpool-II' connection_cache = on reset_query_list = 'ABORT; DISCARD ALL' replication_mode = off replicate_select = off insert_lock = on lobj_lock_table = '' replication_stop_on_mismatch = off failover_if_affected_tuples_mismatch = off load_balance_mode = on ignore_leading_white_space = on white_function_list = '' black_function_list = 'nextval,setval,nextval,setval' database_redirect_preference_list = '' app_name_redirect_preference_list = '' allow_sql_comments = off master_slave_mode = on master_slave_sub_mode = 'stream' sr_check_period = 30 sr_check_user = 'replication' sr_check_password = 'password' sr_check_database = 'postgres' delay_threshold = 0 follow_master_command = '' health_check_period = 30 health_check_timeout = 60 health_check_user = 'replication' health_check_password = 'password' health_check_database = 'postgres' health_check_max_retries = 3 health_check_retry_delay = 20 connect_timeout = 10000 failover_command = '/home/postgres/data/failover.sh %d %P %H %R' failback_command = '' fail_over_on_backend_error = off search_primary_node_timeout = 0 recovery_user = 'nobody' recovery_password = '' recovery_1st_stage_command = '' recovery_2nd_stage_command = '' recovery_timeout = 90 client_idle_limit_in_recovery = 0 use_watchdog = on trusted_servers = '' ping_path = '/usr/bin' wd_hostname = 'server2' wd_port = 9000 wd_priority = 1 wd_authkey = '' wd_ipc_socket_dir = '/tmp' delegate_IP = '10.10.10.10' if_cmd_path = '/usr/sbin' if_up_cmd = 'ip addr add $_IP_$/26 dev eth0 label eth0:0' if_down_cmd = 'ip addr del $_IP_$/26 dev eth0' arping_path = '/usr/sbin' arping_cmd = 'arping -U $_IP_$ -w 1 -I eth0' clear_memqcache_on_escalation = on wd_escalation_command = '' wd_de_escalation_command = '' failover_when_quorum_exists = off failover_require_consensus = off allow_multiple_failover_requests_from_node = off wd_monitoring_interfaces_list = '' # Comma separated list of interfaces names to monitor. wd_lifecheck_method = 'query' wd_interval = 15 wd_heartbeat_port = 9694 wd_heartbeat_keepalive = 5 wd_heartbeat_deadtime = 30 heartbeat_destination0 = 'host0_ip1' heartbeat_destination_port0 = 9694 heartbeat_device0 = '' wd_life_point = 8 wd_lifecheck_query = 'SELECT 1' wd_lifecheck_dbname = 'template1' wd_lifecheck_user = 'replication' wd_lifecheck_password = 'password' other_pgpool_hostname0 = 'server1' other_pgpool_port0 = 9999 other_wd_port0 = 9000 relcache_expire = 0 relcache_size = 256 check_temp_table = on check_unlogged_table = on memory_cache_enabled = off memqcache_method = 'shmem' memqcache_memcached_host = 'localhost' memqcache_memcached_port = 11211 memqcache_total_size = 67108864 memqcache_max_num_cache = 1000000 memqcache_expire = 0 memqcache_auto_cache_invalidation = on memqcache_maxcache = 409600 memqcache_cache_block_size = 1048576 memqcache_oiddir = '/var/log/pgpool/oiddir' white_memqcache_table_list = '' black_memqcache_table_list = '' | ||||
| Steps To Reproduce | Have not been able to reproduce. | ||||
| Additional Information | The intermittent errors are like: May 11 09:26:59 server2 pgpool[9554]: [651-1] 2018-05-11 09:26:59: pid 9554: DEBUG: checking pgpool status by query May 11 09:26:59 server2 pgpool[9554]: [652-1] 2018-05-11 09:26:59: pid 9554: DEBUG: watchdog life checking May 11 09:26:59 server2 pgpool[9554]: [651-2] 2018-05-11 09:26:59: pid 9554: DETAIL: checking pgpool 0 (server2:9999) May 11 09:26:59 server2 pgpool[9554]: [652-2] 2018-05-11 09:26:59: pid 9554: DETAIL: Connection to database failed: missing "=" after "h" in connection info string May 11 09:26:59 server2 pgpool[9554]: [652-3] May 11 09:26:59 server2 pgpool[9554]: [653-1] 2018-05-11 09:26:59: pid 9554: DEBUG: checking pgpool status by query May 11 09:26:59 server2 pgpool[9554]: [653-2] 2018-05-11 09:26:59: pid 9554: DETAIL: NG; status: 0 life:3 May 11 09:26:59 server2 pgpool[9554]: [654-1] 2018-05-11 09:26:59: pid 9554: DEBUG: checking pgpool status by query May 11 09:26:59 server2 pgpool[9554]: [654-2] 2018-05-11 09:26:59: pid 9554: DETAIL: checking pgpool 1 (server1:9999) May 11 09:26:59 server2 pgpool[9554]: [655-1] 2018-05-11 09:26:59: pid 9554: DEBUG: checking pgpool status by query May 11 09:26:59 server2 pgpool[9554]: [655-2] 2018-05-11 09:26:59: pid 9554: DETAIL: WD_OK: status: 0 Or May 11 09:43:35 server2 pgpool[9554]: [916-1] 2018-05-11 09:43:35: pid 9554: DEBUG: checking pgpool status by query May 11 09:43:35 server2 pgpool[9554]: [916-2] 2018-05-11 09:43:35: pid 9554: DETAIL: checking pgpool 0 (server2:9999) May 11 09:43:35 server2 pgpool[9554]: [917-1] 2018-05-11 09:43:35: pid 9554: DEBUG: watchdog life checking May 11 09:43:35 server2 pgpool[9554]: [917-2] 2018-05-11 09:43:35: pid 9554: DETAIL: Connection to database failed: FATAL: role "root" does not exist May 11 09:43:35 server2 pgpool[9554]: [917-3] May 11 09:43:35 server2 pgpool[9554]: [918-1] 2018-05-11 09:43:35: pid 9554: DEBUG: checking pgpool status by query May 11 09:43:35 server2 pgpool[9554]: [918-2] 2018-05-11 09:43:35: pid 9554: DETAIL: NG; status: 0 life:3 May 11 09:43:35 server2 pgpool[9554]: [919-1] 2018-05-11 09:43:35: pid 9554: DEBUG: checking pgpool status by query May 11 09:43:35 server2 pgpool[9554]: [919-2] 2018-05-11 09:43:35: pid 9554: DETAIL: checking pgpool 1 (server1:9999) May 11 09:43:35 server2 pgpool[9554]: [920-1] 2018-05-11 09:43:35: pid 9554: DEBUG: checking pgpool status by query May 11 09:43:35 server2 pgpool[9554]: [920-2] 2018-05-11 09:43:35: pid 9554: DETAIL: WD_OK: status: 0 | ||||
| Tags | pgpool health check | ||||