View Issue Details

IDProjectCategoryView StatusLast Update
0000397Pgpool-IIBugpublic2019-01-30 10:05
ReporterjjsantamAssigned Topengbo 
PrioritynormalSeveritycrashReproducibilityrandom
Status assignedResolutionopen 
Platformx86OSCentOS LinuxOS Version7.3.1611
Product Version3.7.3 
Target VersionFixed in Version 
Summary0000397: Watchdog lifecheck seldom failure
DescriptionThe watchdog life check is not stable, it works most of the time but occasionally it fails shutingdown a pgpool.

The pgpool.conf:

listen_addresses = '*'
port = 9999
socket_dir = '/tmp'
listen_backlog_multiplier = 2
serialize_accept = on
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'
backend_hostname0 = 'server1'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'server2'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = on
pool_passwd = 'pool_passwd'
authentication_timeout = 60
ssl = off
num_init_children = 100
max_pool = 2
child_life_time = 0
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
log_destination = 'syslog'
log_line_prefix = '%t: pid %p: ' # printf-style string to output at beginning of each log line.
log_connections = off
log_hostname = off
log_statement = off
log_per_node_statement = off
log_standby_delay = 'none'
syslog_facility = 'LOCAL1'
syslog_ident = 'pgpool'
pid_file_name = '/var/run/pgpool-II-10/pgpool.pid'
logdir = '/var/log/pgpool-II'
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off
insert_lock = on
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval,nextval,setval'
database_redirect_preference_list = ''
app_name_redirect_preference_list = ''
allow_sql_comments = off
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 30
sr_check_user = 'replication'
sr_check_password = 'password'
sr_check_database = 'postgres'
delay_threshold = 0
follow_master_command = ''
health_check_period = 30
health_check_timeout = 60
health_check_user = 'replication'
health_check_password = 'password'
health_check_database = 'postgres'
health_check_max_retries = 3
health_check_retry_delay = 20
connect_timeout = 10000
failover_command = '/home/postgres/data/failover.sh %d %P %H %R'
failback_command = ''
fail_over_on_backend_error = off
search_primary_node_timeout = 0
recovery_user = 'nobody'
recovery_password = ''
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
use_watchdog = on
trusted_servers = ''
ping_path = '/usr/bin'
wd_hostname = 'server2'
wd_port = 9000
wd_priority = 1
wd_authkey = ''
wd_ipc_socket_dir = '/tmp'
delegate_IP = '10.10.10.10'
if_cmd_path = '/usr/sbin'
if_up_cmd = 'ip addr add $_IP_$/26 dev eth0 label eth0:0'
if_down_cmd = 'ip addr del $_IP_$/26 dev eth0'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U $_IP_$ -w 1 -I eth0'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_de_escalation_command = ''
failover_when_quorum_exists = off
failover_require_consensus = off
allow_multiple_failover_requests_from_node = off
wd_monitoring_interfaces_list = '' # Comma separated list of interfaces names to monitor.
wd_lifecheck_method = 'query'
wd_interval = 15
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 5
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'host0_ip1'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
wd_life_point = 8
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'replication'
wd_lifecheck_password = 'password'
other_pgpool_hostname0 = 'server1'
other_pgpool_port0 = 9999
other_wd_port0 = 9000
relcache_expire = 0
relcache_size = 256
check_temp_table = on
check_unlogged_table = on
memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 67108864
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 409600
memqcache_cache_block_size = 1048576
memqcache_oiddir = '/var/log/pgpool/oiddir'
white_memqcache_table_list = ''
black_memqcache_table_list = ''
Steps To ReproduceHave not been able to reproduce.
Additional InformationThe intermittent errors are like:

May 11 09:26:59 server2 pgpool[9554]: [651-1] 2018-05-11 09:26:59: pid 9554: DEBUG: checking pgpool status by query
May 11 09:26:59 server2 pgpool[9554]: [652-1] 2018-05-11 09:26:59: pid 9554: DEBUG: watchdog life checking
May 11 09:26:59 server2 pgpool[9554]: [651-2] 2018-05-11 09:26:59: pid 9554: DETAIL: checking pgpool 0 (server2:9999)
May 11 09:26:59 server2 pgpool[9554]: [652-2] 2018-05-11 09:26:59: pid 9554: DETAIL: Connection to database failed: missing "=" after "h" in connection info string
May 11 09:26:59 server2 pgpool[9554]: [652-3]
May 11 09:26:59 server2 pgpool[9554]: [653-1] 2018-05-11 09:26:59: pid 9554: DEBUG: checking pgpool status by query
May 11 09:26:59 server2 pgpool[9554]: [653-2] 2018-05-11 09:26:59: pid 9554: DETAIL: NG; status: 0 life:3
May 11 09:26:59 server2 pgpool[9554]: [654-1] 2018-05-11 09:26:59: pid 9554: DEBUG: checking pgpool status by query
May 11 09:26:59 server2 pgpool[9554]: [654-2] 2018-05-11 09:26:59: pid 9554: DETAIL: checking pgpool 1 (server1:9999)
May 11 09:26:59 server2 pgpool[9554]: [655-1] 2018-05-11 09:26:59: pid 9554: DEBUG: checking pgpool status by query
May 11 09:26:59 server2 pgpool[9554]: [655-2] 2018-05-11 09:26:59: pid 9554: DETAIL: WD_OK: status: 0

Or

May 11 09:43:35 server2 pgpool[9554]: [916-1] 2018-05-11 09:43:35: pid 9554: DEBUG: checking pgpool status by query
May 11 09:43:35 server2 pgpool[9554]: [916-2] 2018-05-11 09:43:35: pid 9554: DETAIL: checking pgpool 0 (server2:9999)
May 11 09:43:35 server2 pgpool[9554]: [917-1] 2018-05-11 09:43:35: pid 9554: DEBUG: watchdog life checking
May 11 09:43:35 server2 pgpool[9554]: [917-2] 2018-05-11 09:43:35: pid 9554: DETAIL: Connection to database failed: FATAL: role "root" does not exist
May 11 09:43:35 server2 pgpool[9554]: [917-3]
May 11 09:43:35 server2 pgpool[9554]: [918-1] 2018-05-11 09:43:35: pid 9554: DEBUG: checking pgpool status by query
May 11 09:43:35 server2 pgpool[9554]: [918-2] 2018-05-11 09:43:35: pid 9554: DETAIL: NG; status: 0 life:3
May 11 09:43:35 server2 pgpool[9554]: [919-1] 2018-05-11 09:43:35: pid 9554: DEBUG: checking pgpool status by query
May 11 09:43:35 server2 pgpool[9554]: [919-2] 2018-05-11 09:43:35: pid 9554: DETAIL: checking pgpool 1 (server1:9999)
May 11 09:43:35 server2 pgpool[9554]: [920-1] 2018-05-11 09:43:35: pid 9554: DEBUG: checking pgpool status by query
May 11 09:43:35 server2 pgpool[9554]: [920-2] 2018-05-11 09:43:35: pid 9554: DETAIL: WD_OK: status: 0
Tagspgpool health check

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2018-05-11 18:41 jjsantam New Issue
2018-05-15 00:54 jjsantam Tag Attached: pgpool health check
2019-01-30 10:05 pengbo Assigned To => pengbo
2019-01-30 10:05 pengbo Status new => assigned