0000697: auto_failback never triggers - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000697	Pgpool-II	Bug	public	2021-03-16 06:04	2021-04-08 13:56

Reporter	Garreat	Assigned To	pengbo
Priority	normal	Severity	minor	Reproducibility	always
Status	closed	Resolution	open
Platform	Linux/Docker
Product Version	4.2.2

Summary	0000697: auto_failback never triggers
Description	I'm using pgpool 4.2.2 in a containerized environment. auto_failback feature never works despite setting up sr_check, healthcheck, pg_stat_statements on Postgres nodes, log_hostname etc. I have a standby backend defined: backend_hostname1 = 'mbase_database-replica-1' backend_port1 = '5432' backend_flag1 = 'ALLOW_TO_FAILOVER' I bring the Postgres service down, pgpool detects that and sets backend status 'Down' -- ok. Then I bring the service back up. Pgpool healthcheck detects that, healthchecks are now skipped, but backend status stays 'Down'. In pg_stat_replication view, there are no problems with the replication: # select * from pg_stat_replication; pid \| usesysid \| usename \| application_name \| client_addr \| client_hostname \| client_port \| backend_start \| backend_xmin \| state \| sent_lsn \| wr ite_lsn \| flush_lsn \| replay_lsn \| write_lag \| flush_lag \| replay_lag \| sync_priority \| sync_state \| reply_time ------+----------+---------+------------------+-------------+-------------------------------------------- ----------------+-------------+-------------------------------+--------------+-----------+-----------+--- --------+-----------+------------+-----------+-----------+------------+---------------+------------+----- -------------------------- 1033 \| 10 \| mbase \| walreceiver \| 10.0.2.242 \| mbase_database-replica-1.1.rlbz732fxblqjk17 lcptok7kb.net2 \| 33178 \| 2021-03-15 21:29:31.750421+01 \| \| streaming \| 0/48F2238 \| 0/ 48F2238 \| 0/48F2238 \| 0/48F2238 \| \| \| \| 0 \| async \| 2021 -03-15 21:34:01.083468+01 Does this feature only work when using static IP addresses as backend? Docker service gets re-created with different IP, however I define it using a hostname... Maybe the client_hostname in pg_stat_replication view has to exactly match the defined backend name? Docker's reverse DNS returns (hostname).(replica-number).(random-garbage). Docker or not, I believe pgpool should try compare both hostname and FQDN. I'm trying to find a solution for a few days already to no success. Would be a perfectly usable Postgres Swarm stack with this feature...
Tags	No tags attached.

Garreat 2021-03-16 06:04 reporter	pgpool.conf (23,062 bytes) #------------------------------------------------------------------------------ # MAIN OPTIONS #------------------------------------------------------------------------------ listen_addresses = '' port = '5432' backend_clustering_mode = 'streaming_replication' backend_hostname0 = 'mbase_database-primary' backend_port0 = '5432' backend_flag0 = 'ALWAYS_PRIMARY\|DISALLOW_TO_FAILOVER' backend_hostname1 = 'mbase_database-replica-1' backend_port1 = '5432' backend_flag1 = 'ALLOW_TO_FAILOVER' num_init_children = 300 # Number of concurrent sessions allowed # (change requires restart) reserved_connections = 0 # Number of reserved connections. # Pgpool-II does not accept connections if over # num_init_chidlren - reserved_connections connection_life_time = 0 # Connection to backend closes after being idle for this many seconds # 0 means no close client_idle_limit = 0 # Client is disconnected after being idle for that many seconds # (even inside an explicit transactions!) # 0 means no disconnection #------------------------------------------------------------------------------ # CONNECTION POOLING #------------------------------------------------------------------------------ connection_cache = off # Activate connection pools # (change requires restart) max_pool = 1 # Number of connection pool caches per connection # (change requires restart) child_life_time = 10 # Pool exits after being idle for this many seconds child_max_connections = 0 # Pool exits after receiving that many connections # 0 means no exit #------------------------------------------------------------------------------ # STREAMING REPLICATION #------------------------------------------------------------------------------ sr_check_period = 10 # Streaming replication check period # Disabled (0) by default sr_check_user = 'mbase' # Streaming replication check user # This is necessary even if you disable # streaming replication delay check with # sr_check_period = 0 sr_check_password = '' # Password for streaming replication check user. # Leaving it empty will make Pgpool-II to first look for the # Password in pool_passwd file before using the empty password sr_check_database = 'mbase' # Database name for streaming replication check delay_threshold = 0 # Threshold before not dispatching query to standby node # Unit is in bytes # Disabled (0) by default #------------------------------------------------------------------------------ # LOAD BALANCING MODE #------------------------------------------------------------------------------ load_balance_mode = on # Activate load balancing mode ignore_leading_white_space = off # Ignore leading white spaces of each query #database_redirect_preference_list = '' # comma separated list of pairs of database and node id. # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2' # valid for streaming replicaton mode only. #app_name_redirect_preference_list = '' # comma separated list of pairs of app name and node id. # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby' # valid for streaming replicaton mode only. #allow_sql_comments = off # if on, ignore SQL comments when judging if load balance or # query cache is possible. # If off, SQL comments effectively prevent the judgment # (pre 3.4 behavior). disable_load_balance_on_write = 'trans_transaction' # Load balance behavior when write query is issued # in an explicit transaction. # Note that any query not in an explicit transaction # is not affected by the parameter. # 'transaction' (the default): if a write query is issued, # subsequent read queries will not be load balanced # until the transaction ends. # 'trans_transaction': if a write query is issued, # subsequent read queries in an explicit transaction # will not be load balanced until the session ends. # 'always': if a write query is issued, read queries will # not be load balanced until the session ends. #statement_level_load_balance = off # Enables statement level load balancing read_only_function_list = '' # Comma separated list of function names # that don't write to database # Regexp are accepted write_function_list = '' # Comma separated list of function names # that write to database # Regexp are accepted # If both read_only_function_list and write_function_list # is empty, function's volatile property is checked. # If it's volatile, the function is regarded as a # writing function. #------------------------------------------------------------------------------ # LOGGING #------------------------------------------------------------------------------ log_destination = 'stderr' # Where to log # Valid values are combinations of stderr, # and syslog. Default to stderr. log_line_prefix = '%t: pid %p: ' # printf-style string to output at beginning of each log line. log_connections = off # Log connections log_hostname = off # Hostname will be shown in ps status # and in logs if connections are logged log_statement = off # Log all statements log_per_node_statement = off # Log all statements # with node and backend informations log_client_messages = off # Log any client messages log_standby_delay = 'always' # Log standby delay # Valid values are combinations of always, # if_over_threshold, none log_error_verbosity = verbose # terse, default, or verbose messages #client_min_messages = notice # values in order of decreasing detail: # debug5 # debug4 # debug3 # debug2 # debug1 # log # notice # warning # error log_min_messages = notice # values in order of decreasing detail: # debug5 # debug4 # debug3 # debug2 # debug1 # info # notice # warning # error # log # fatal # panic #------------------------------------------------------------------------------ # - pgpool Communication Manager Connection Settings - #------------------------------------------------------------------------------ #pcp_listen_addresses = '' # Host name or IP address for pcp process to listen on: # '*' for all, '' for no TCP/IP connections # (change requires restart) pcp_port = 9898 # Port number for pcp # (change requires restart) pcp_socket_dir = '/tmp' # Unix domain socket path for pcp # The Debian package defaults to # /var/run/postgresql # (change requires restart) #------------------------------------------------------------------------------ # - Authentication - #------------------------------------------------------------------------------ enable_pool_hba = on # Use pool_hba.conf for client authentication pool_passwd = '/etc/pgpool-II/pool_passwd' # File name of pool_passwd for md5 authentication. # "" disables pool_passwd. # (change requires restart) #authentication_timeout = 60 # Delay in seconds to complete client authentication # 0 means no timeout. #allow_clear_text_frontend_auth = off # Allow Pgpool-II to use clear text password authentication # with clients, when pool_passwd does not # contain the user password #------------------------------------------------------------------------------ # IN MEMORY QUERY MEMORY CACHE #------------------------------------------------------------------------------ #memqcache_memcached_host = 'localhost' # Memcached host name or IP address. Mandatory if # memqcache_method = 'memcached'. # Defaults to localhost. # (change requires restart) #memqcache_memcached_port = 11211 # Memcached port number. Mondatory if memqcache_method = 'memcached'. # Defaults to 11211. # (change requires restart) #memqcache_total_size = 67108864 # Total memory size in bytes for storing memory cache. # Mandatory if memqcache_method = 'shmem'. # Defaults to 64MB. # (change requires restart) memory_cache_enabled = off # If on, use the memory cache functionality, off by default # (change requires restart) memqcache_method = 'shmem' # Cache storage method. either 'shmem'(shared memory) or # 'memcached'. 'shmem' by default # (change requires restart) memqcache_max_num_cache = 1000000 # Total number of cache entries. Mandatory # if memqcache_method = 'shmem'. # Each cache entry consumes 48 bytes on shared memory. # Defaults to 1,000,000(45.8MB). # (change requires restart) memqcache_expire = 0 # Memory cache entry life time specified in seconds. # 0 means infinite life time. 0 by default. # (change requires restart) memqcache_auto_cache_invalidation = on # If on, invalidation of query cache is triggered by corresponding # DDL/DML/DCL(and memqcache_expire). If off, it is only triggered # by memqcache_expire. on by default. # (change requires restart) memqcache_maxcache = 409600 # Maximum SELECT result size in bytes. # Must be smaller than memqcache_cache_block_size. Defaults to 400KB. # (change requires restart) memqcache_cache_block_size = 1048576 # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'. # Defaults to 1MB. # (change requires restart) memqcache_oiddir = '/var/log/pgpool/oiddir' # Temporary work directory to record table oids # (change requires restart) #white_memqcache_table_list = '' # Comma separated list of table names to memcache # that don't write to database # Regexp are accepted #black_memqcache_table_list = '' # Comma separated list of table names not to memcache # that don't write to database # Regexp are accepted #------------------------------------------------------------------------------ # HEALTH CHECK GLOBAL PARAMETERS #------------------------------------------------------------------------------ health_check_period = 5 # Health check period # Disabled (0) by default health_check_timeout = 10 # Health check timeout # 0 means no timeout health_check_user = 'mbase' # Health check user health_check_password = '' # Password for health check user # Leaving it empty will make Pgpool-II to first look for the # Password in pool_passwd file before using the empty password health_check_database = 'mbase' # Database name for health check. If '', tries 'postgres' frist, then 'template1' health_check_max_retries = 259200 # Maximum number of times to retry a failed health check before giving up. health_check_retry_delay = 10 # Amount of time to wait (in seconds) between retries. connect_timeout = 5000 # Timeout value in milliseconds before giving up to connect to backend. # Default is 10000 ms (10 second). Flaky network user may want to increase # the value. 0 means no timeout. # Note that this value is not only used for health check, # but also for ordinary conection to backend. #------------------------------------------------------------------------------ # HEALTH CHECK PER NODE PARAMETERS (OPTIONAL) #------------------------------------------------------------------------------ #health_check_period0 = 0 #health_check_timeout0 = 20 #health_check_user0 = 'nobody' #health_check_password0 = '' #health_check_database0 = '' #health_check_max_retries0 = 0 #health_check_retry_delay0 = 1 #connect_timeout0 = 10000 #------------------------------------------------------------------------------ # FAILOVER AND FAILBACK #------------------------------------------------------------------------------ failover_command = '' # Executes this command at failover # Special values: # %d = failed node id # %h = failed node host name # %p = failed node port number # %D = failed node database cluster path # %m = new master node id # %H = new master node hostname # %M = old master node id # %P = old primary node id # %r = new master port number # %R = new master database cluster path # %N = old primary node hostname # %S = old primary node port number # %% = '%' character failback_command = '' # Executes this command at failback. # Special values: # %d = failed node id # %h = failed node host name # %p = failed node port number # %D = failed node database cluster path # %m = new master node id # %H = new master node hostname # %M = old master node id # %P = old primary node id # %r = new master port number # %R = new master database cluster path # %N = old primary node hostname # %S = old primary node port number # %% = '%' character failover_on_backend_error = on # Initiates failover when reading/writing to the # backend communication socket fails # If set to off, pgpool will report an # error and disconnect the session. detach_false_primary = off # Detach false primary if on. Only # valid in streaming replicaton # mode and with PostgreSQL 9.6 or # after. search_primary_node_timeout = 20 # Timeout in seconds to search for the # primary node when a failover occurs. # 0 means no timeout, keep searching # for a primary node forever. auto_failback = on # Dettached backend node reattach automatically # if replication_state is 'streaming'. auto_failback_interval = 5 # Min interval of executing auto_failback in # seconds. #------------------------------------------------------------------------------ # OTHERS #------------------------------------------------------------------------------ relcache_expire = 0 # Life time of relation cache in seconds. # 0 means no cache expiration(the default). # The relation cache is used for cache the # query result against PostgreSQL system # catalog to obtain various information # including table structures or if it's a # temporary table or not. The cache is # maintained in a pgpool child local memory # and being kept as long as it survives. # If someone modify the table by using # ALTER TABLE or some such, the relcache is # not consistent anymore. # For this purpose, cache_expiration # controls the life time of the cache. #relcache_size = 256 # Number of relation cache # entry. If you see frequently: # "pool_search_relcache: cache replacement happend" # in the pgpool log, you might want to increate this number. #check_temp_table = catalog # Temporary table check method. catalog, trace or none. # Default is catalog. #check_unlogged_table = on # If on, enable unlogged table check in SELECT statements. # This initiates queries against system catalog of primary/master # thus increases load of master. # If you are absolutely sure that your system never uses unlogged tables # and you want to save access to primary/master, you could turn this off. # Default is on. #enable_shared_relcache = on # If on, relation cache stored in memory cache, # the cache is shared among child process. # Default is on. # (change requires restart) pgpool.conf (23,062 bytes)

Garreat 2021-03-16 06:08 reporter ~0003768	Postgres 12.3

pengbo 2021-03-17 18:08 developer ~0003772 Last edited: 2021-03-17 18:09	> Does this feature only work when using static IP addresses as backend? Docker service gets re-created with different IP, however I define it using a hostname... > Maybe the client_hostname in pg_stat_replication view has to exactly match the defined backend name? Docker's reverse DNS returns (hostname).(replica-number).(random-garbage). Docker or not, I believe pgpool should try compare both hostname and FQDN. To use this auto_failback feature, you need to specify "backend_application_name1" which is specified in "primary_conninfo". Could you try again by specifying "backend_application_name"? https://www.pgpool.net/docs/latest/en/html/runtime-config-backend-settings.html#GUC-BACKEND-APPLICATION-NAME Sorry that document doesn't mention that. I will add it to documentation.

Garreat 2021-04-08 01:38 reporter ~0003797	I can confirm that now it works as intended. Thank you! Sorry for the delay -- I got covid.

pengbo 2021-04-08 13:55 developer ~0003798	OK. I'm going to close this issue. Take care!

Date Modified	Username	Field	Change
2021-03-16 06:04	Garreat	New Issue
2021-03-16 06:04	Garreat	File Added: pgpool.conf
2021-03-16 06:08	Garreat	Note Added: 0003768
2021-03-17 18:00	pengbo	Assigned To	=> pengbo
2021-03-17 18:00	pengbo	Status	new => assigned
2021-03-17 18:08	pengbo	Note Added: 0003772
2021-03-17 18:09	pengbo	Note Edited: 0003772
2021-03-17 18:09	pengbo	Status	assigned => feedback
2021-04-08 01:38	Garreat	Note Added: 0003797
2021-04-08 01:38	Garreat	Status	feedback => assigned
2021-04-08 13:55	pengbo	Note Added: 0003798
2021-04-08 13:56	pengbo	Status	assigned => closed