View Issue Details

IDProjectCategoryView StatusLast Update
0000697Pgpool-IIBugpublic2021-04-08 13:56
ReporterGarreat Assigned Topengbo  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionopen 
PlatformLinux/Docker 
Product Version4.2.2 
Summary0000697: auto_failback never triggers
DescriptionI'm using pgpool 4.2.2 in a containerized environment.
auto_failback feature never works despite setting up sr_check, healthcheck, pg_stat_statements on Postgres nodes, log_hostname etc.

I have a standby backend defined:

backend_hostname1 = 'mbase_database-replica-1'
backend_port1 = '5432'
backend_flag1 = 'ALLOW_TO_FAILOVER'

I bring the Postgres service down, pgpool detects that and sets backend status 'Down' -- ok.
Then I bring the service back up. Pgpool healthcheck detects that, healthchecks are now skipped, but backend status stays 'Down'.

In pg_stat_replication view, there are no problems with the replication:

# select * from pg_stat_replication;
 pid | usesysid | usename | application_name | client_addr | client_hostname
                | client_port | backend_start | backend_xmin | state | sent_lsn | wr
ite_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |
     reply_time
------+----------+---------+------------------+-------------+--------------------------------------------
----------------+-------------+-------------------------------+--------------+-----------+-----------+---
--------+-----------+------------+-----------+-----------+------------+---------------+------------+-----
--------------------------
 1033 | 10 | mbase | walreceiver | 10.0.2.242 | mbase_database-replica-1.1.rlbz732fxblqjk17
lcptok7kb.net2 | 33178 | 2021-03-15 21:29:31.750421+01 | | streaming | 0/48F2238 | 0/
48F2238 | 0/48F2238 | 0/48F2238 | | | | 0 | async | 2021
-03-15 21:34:01.083468+01


Does this feature only work when using static IP addresses as backend? Docker service gets re-created with different IP, however I define it using a hostname...
Maybe the client_hostname in pg_stat_replication view has to exactly match the defined backend name? Docker's reverse DNS returns (hostname).(replica-number).(random-garbage). Docker or not, I believe pgpool should try compare both hostname and FQDN.

I'm trying to find a solution for a few days already to no success.
Would be a perfectly usable Postgres Swarm stack with this feature...
TagsNo tags attached.

Activities

Garreat

2021-03-16 06:04

reporter  

pgpool.conf (23,062 bytes)   
#------------------------------------------------------------------------------
# MAIN OPTIONS
#------------------------------------------------------------------------------
listen_addresses = '*'
port = '5432'

backend_clustering_mode = 'streaming_replication'

backend_hostname0 = 'mbase_database-primary'
backend_port0 = '5432'
backend_flag0 = 'ALWAYS_PRIMARY|DISALLOW_TO_FAILOVER'

backend_hostname1 = 'mbase_database-replica-1'
backend_port1 = '5432'
backend_flag1 = 'ALLOW_TO_FAILOVER'

num_init_children = 300
                                   # Number of concurrent sessions allowed
                                   # (change requires restart)
reserved_connections = 0
                                 # Number of reserved connections.
                                 # Pgpool-II does not accept connections if over
                                 # num_init_chidlren - reserved_connections
connection_life_time = 0
                                # Connection to backend closes after being idle for this many seconds
                                # 0 means no close
client_idle_limit = 0
                                # Client is disconnected after being idle for that many seconds
                                # (even inside an explicit transactions!)
                                # 0 means no disconnection

#------------------------------------------------------------------------------
# CONNECTION POOLING
#------------------------------------------------------------------------------
connection_cache = off
                                   # Activate connection pools
                                   # (change requires restart)
max_pool = 1
                                  # Number of connection pool caches per connection
                                  # (change requires restart)
child_life_time = 10
                                 # Pool exits after being idle for this many seconds
child_max_connections = 0
                                 # Pool exits after receiving that many connections
                                 # 0 means no exit

#------------------------------------------------------------------------------
# STREAMING REPLICATION
#------------------------------------------------------------------------------
sr_check_period = 10
                                  # Streaming replication check period
                                  # Disabled (0) by default
sr_check_user = 'mbase'
                                  # Streaming replication check user
                                  # This is necessary even if you disable
                                  # streaming replication delay check with
                                  # sr_check_period = 0
sr_check_password = ''
                                  # Password for streaming replication check user.
                                  # Leaving it empty will make Pgpool-II to first look for the
                                  # Password in pool_passwd file before using the empty password
sr_check_database = 'mbase'
                                  # Database name for streaming replication check
delay_threshold = 0
                                 # Threshold before not dispatching query to standby node
                                 # Unit is in bytes
                                 # Disabled (0) by default

#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------
load_balance_mode = on
                                   # Activate load balancing mode
ignore_leading_white_space = off
                                 # Ignore leading white spaces of each query
#database_redirect_preference_list = ''
                                 # comma separated list of pairs of database and node id.
                                 # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2'
                                 # valid for streaming replicaton mode only.
#app_name_redirect_preference_list = ''
                                 # comma separated list of pairs of app name and node id.
                                 # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby'
                                 # valid for streaming replicaton mode only.
#allow_sql_comments = off
                                 # if on, ignore SQL comments when judging if load balance or
                                 # query cache is possible.
                                 # If off, SQL comments effectively prevent the judgment
                                 # (pre 3.4 behavior).
disable_load_balance_on_write = 'trans_transaction'
                                 # Load balance behavior when write query is issued
                                 # in an explicit transaction.
                                 # Note that any query not in an explicit transaction
                                 # is not affected by the parameter.
                                 # 'transaction' (the default): if a write query is issued,
                                 # subsequent read queries will not be load balanced
                                 # until the transaction ends.
                                 # 'trans_transaction': if a write query is issued,
                                 # subsequent read queries in an explicit transaction
                                 # will not be load balanced until the session ends.
                                 # 'always': if a write query is issued, read queries will
                                 # not be load balanced until the session ends.
#statement_level_load_balance = off
                                 # Enables statement level load balancing
read_only_function_list = ''
                                     # Comma separated list of function names
                                     # that don't write to database
                                     # Regexp are accepted
write_function_list = ''
                                     # Comma separated list of function names
                                     # that write to database
                                     # Regexp are accepted
                                     # If both read_only_function_list and write_function_list
                                     # is empty, function's volatile property is checked.
                                     # If it's volatile, the function is regarded as a
                                     # writing function.

#------------------------------------------------------------------------------
# LOGGING
#------------------------------------------------------------------------------
log_destination = 'stderr'
                                # Where to log
                                # Valid values are combinations of stderr,
                                # and syslog. Default to stderr.
log_line_prefix = '%t: pid %p: '
                                # printf-style string to output at beginning of each log line.
log_connections = off
                                   # Log connections
log_hostname = off
                                   # Hostname will be shown in ps status
                                   # and in logs if connections are logged
log_statement = off
                                   # Log all statements
log_per_node_statement = off
                                   # Log all statements
                                   # with node and backend informations
log_client_messages = off
                                   # Log any client messages
log_standby_delay = 'always'
                                   # Log standby delay
                                   # Valid values are combinations of always,
                                   # if_over_threshold, none
log_error_verbosity = verbose
                                  # terse, default, or verbose messages
#client_min_messages = notice
                                      # values in order of decreasing detail:
                                       #   debug5
                                       #   debug4
                                       #   debug3
                                       #   debug2
                                       #   debug1
                                       #   log
                                       #   notice
                                       #   warning
                                       #   error
log_min_messages = notice             # values in order of decreasing detail:
                                       #   debug5
                                       #   debug4
                                       #   debug3
                                       #   debug2
                                       #   debug1
                                       #   info
                                       #   notice
                                       #   warning
                                       #   error
                                       #   log
                                       #   fatal
                                       #   panic

#------------------------------------------------------------------------------
# - pgpool Communication Manager Connection Settings -
#------------------------------------------------------------------------------
#pcp_listen_addresses = '*'
                                   # Host name or IP address for pcp process to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
pcp_port = 9898
                                   # Port number for pcp
                                   # (change requires restart)
pcp_socket_dir = '/tmp'
                                   # Unix domain socket path for pcp
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)

#------------------------------------------------------------------------------
# - Authentication -
#------------------------------------------------------------------------------
enable_pool_hba = on
                                   # Use pool_hba.conf for client authentication
pool_passwd = '/etc/pgpool-II/pool_passwd'
                                   # File name of pool_passwd for md5 authentication.
                                   # "" disables pool_passwd.
                                   # (change requires restart)
#authentication_timeout = 60
                                   # Delay in seconds to complete client authentication
                                   # 0 means no timeout.

#allow_clear_text_frontend_auth = off
                                   # Allow Pgpool-II to use clear text password authentication
                                   # with clients, when pool_passwd does not
                                   # contain the user password

#------------------------------------------------------------------------------
# IN MEMORY QUERY MEMORY CACHE
#------------------------------------------------------------------------------
#memqcache_memcached_host = 'localhost'
                                  # Memcached host name or IP address. Mandatory if
                                  # memqcache_method = 'memcached'.
                                  # Defaults to localhost.
                                  # (change requires restart)
#memqcache_memcached_port = 11211
                                  # Memcached port number. Mondatory if memqcache_method = 'memcached'.
                                  # Defaults to 11211.
                                  # (change requires restart)
#memqcache_total_size = 67108864
                                  # Total memory size in bytes for storing memory cache.
                                  # Mandatory if memqcache_method = 'shmem'.
                                  # Defaults to 64MB.
                                  # (change requires restart)
memory_cache_enabled = off
                               # If on, use the memory cache functionality, off by default
                               # (change requires restart)
memqcache_method = 'shmem'
                               # Cache storage method. either 'shmem'(shared memory) or
                               # 'memcached'. 'shmem' by default
                               # (change requires restart)

memqcache_max_num_cache = 1000000
                                  # Total number of cache entries. Mandatory
                                  # if memqcache_method = 'shmem'.
                                  # Each cache entry consumes 48 bytes on shared memory.
                                  # Defaults to 1,000,000(45.8MB).
                                  # (change requires restart)
memqcache_expire = 0
                                  # Memory cache entry life time specified in seconds.
                                  # 0 means infinite life time. 0 by default.
                                  # (change requires restart)
memqcache_auto_cache_invalidation = on
                                  # If on, invalidation of query cache is triggered by corresponding
                                  # DDL/DML/DCL(and memqcache_expire).  If off, it is only triggered
                                  # by memqcache_expire.  on by default.
                                  # (change requires restart)
memqcache_maxcache = 409600
                                  # Maximum SELECT result size in bytes.
                                  # Must be smaller than memqcache_cache_block_size. Defaults to 400KB.
                                  # (change requires restart)
memqcache_cache_block_size = 1048576
                                  # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.
                                  # Defaults to 1MB.
                                  # (change requires restart)
memqcache_oiddir = '/var/log/pgpool/oiddir'
                                  # Temporary work directory to record table oids
                                  # (change requires restart)
#white_memqcache_table_list = ''
                                  # Comma separated list of table names to memcache
                                  # that don't write to database
                                  # Regexp are accepted
#black_memqcache_table_list = ''
                                  # Comma separated list of table names not to memcache
                                  # that don't write to database
                                  # Regexp are accepted

#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS
#------------------------------------------------------------------------------
health_check_period = 5
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 10
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'mbase'
                                   # Health check user
health_check_password = ''
                                   # Password for health check user
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password
health_check_database = 'mbase'
                                   # Database name for health check. If '', tries 'postgres' frist, then 'template1'
health_check_max_retries = 259200
                                   # Maximum number of times to retry a failed health check before giving up.
health_check_retry_delay = 10
                                   # Amount of time to wait (in seconds) between retries.
connect_timeout = 5000
                                   # Timeout value in milliseconds before giving up to connect to backend.
                                   # Default is 10000 ms (10 second). Flaky network user may want to increase
                                   # the value. 0 means no timeout.
                                   # Note that this value is not only used for health check,
                                   # but also for ordinary conection to backend.

#------------------------------------------------------------------------------
# HEALTH CHECK PER NODE PARAMETERS (OPTIONAL)
#------------------------------------------------------------------------------
#health_check_period0 = 0
#health_check_timeout0 = 20
#health_check_user0 = 'nobody'
#health_check_password0 = ''
#health_check_database0 = ''
#health_check_max_retries0 = 0
#health_check_retry_delay0 = 1
#connect_timeout0 = 10000

#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = ''
                                   # Executes this command at failover
                                   # Special values:
                                   #   %d = failed node id
                                   #   %h = failed node host name
                                   #   %p = failed node port number
                                   #   %D = failed node database cluster path
                                   #   %m = new master node id
                                   #   %H = new master node hostname
                                   #   %M = old master node id
                                   #   %P = old primary node id
                                   #   %r = new master port number
                                   #   %R = new master database cluster path
                                   #   %N = old primary node hostname
                                   #   %S = old primary node port number
                                   #   %% = '%' character
failback_command = ''
                                   # Executes this command at failback.
                                   # Special values:
                                   #   %d = failed node id
                                   #   %h = failed node host name
                                   #   %p = failed node port number
                                   #   %D = failed node database cluster path
                                   #   %m = new master node id
                                   #   %H = new master node hostname
                                   #   %M = old master node id
                                   #   %P = old primary node id
                                   #   %r = new master port number
                                   #   %R = new master database cluster path
                                   #   %N = old primary node hostname
                                   #   %S = old primary node port number
                                   #   %% = '%' character

failover_on_backend_error = on
                                   # Initiates failover when reading/writing to the
                                   # backend communication socket fails
                                   # If set to off, pgpool will report an
                                   # error and disconnect the session.

detach_false_primary = off
                                   # Detach false primary if on. Only
                                   # valid in streaming replicaton
                                   # mode and with PostgreSQL 9.6 or
                                   # after.

search_primary_node_timeout = 20
                                   # Timeout in seconds to search for the
                                   # primary node when a failover occurs.
                                   # 0 means no timeout, keep searching
                                   # for a primary node forever.

auto_failback = on
                                   # Dettached backend node reattach automatically
                                   # if replication_state is 'streaming'.
auto_failback_interval = 5
                                   # Min interval of executing auto_failback in
                                   # seconds.

#------------------------------------------------------------------------------
# OTHERS
#------------------------------------------------------------------------------
relcache_expire = 0
                                   # Life time of relation cache in seconds.
                                   # 0 means no cache expiration(the default).
                                   # The relation cache is used for cache the
                                   # query result against PostgreSQL system
                                   # catalog to obtain various information
                                   # including table structures or if it's a
                                   # temporary table or not. The cache is
                                   # maintained in a pgpool child local memory
                                   # and being kept as long as it survives.
                                   # If someone modify the table by using
                                   # ALTER TABLE or some such, the relcache is
                                   # not consistent anymore.
                                   # For this purpose, cache_expiration
                                   # controls the life time of the cache.
#relcache_size = 256
                                   # Number of relation cache
                                   # entry. If you see frequently:
                                   # "pool_search_relcache: cache replacement happend"
                                   # in the pgpool log, you might want to increate this number.
#check_temp_table = catalog
                                   # Temporary table check method. catalog, trace or none.
                                   # Default is catalog.
#check_unlogged_table = on
                                   # If on, enable unlogged table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/master
                                   # thus increases load of master.
                                   # If you are absolutely sure that your system never uses unlogged tables
                                   # and you want to save access to primary/master, you could turn this off.
                                   # Default is on.
#enable_shared_relcache = on
                                   # If on, relation cache stored in memory cache,
                                   # the cache is shared among child process.
                                   # Default is on.
                                   # (change requires restart)
pgpool.conf (23,062 bytes)   

Garreat

2021-03-16 06:08

reporter   ~0003768

Postgres 12.3

pengbo

2021-03-17 18:08

developer   ~0003772

Last edited: 2021-03-17 18:09

> Does this feature only work when using static IP addresses as backend? Docker service gets re-created with different IP, however I define it using a hostname...
> Maybe the client_hostname in pg_stat_replication view has to exactly match the defined backend name? Docker's reverse DNS returns (hostname).(replica-number).(random-garbage). Docker or not, I believe pgpool should try compare both hostname and FQDN.

To use this auto_failback feature, you need to specify "backend_application_name1" which is specified in "primary_conninfo".
Could you try again by specifying "backend_application_name"?

https://www.pgpool.net/docs/latest/en/html/runtime-config-backend-settings.html#GUC-BACKEND-APPLICATION-NAME

Sorry that document doesn't mention that.
I will add it to documentation.

Garreat

2021-04-08 01:38

reporter   ~0003797

I can confirm that now it works as intended. Thank you!

Sorry for the delay -- I got covid.

pengbo

2021-04-08 13:55

developer   ~0003798

OK. I'm going to close this issue.
Take care!

Issue History

Date Modified Username Field Change
2021-03-16 06:04 Garreat New Issue
2021-03-16 06:04 Garreat File Added: pgpool.conf
2021-03-16 06:08 Garreat Note Added: 0003768
2021-03-17 18:00 pengbo Assigned To => pengbo
2021-03-17 18:00 pengbo Status new => assigned
2021-03-17 18:08 pengbo Note Added: 0003772
2021-03-17 18:09 pengbo Note Edited: 0003772
2021-03-17 18:09 pengbo Status assigned => feedback
2021-04-08 01:38 Garreat Note Added: 0003797
2021-04-08 01:38 Garreat Status feedback => assigned
2021-04-08 13:55 pengbo Note Added: 0003798
2021-04-08 13:56 pengbo Status assigned => closed