[pgpool-general: 6264] Re: failiover fails sometimes with failover :falling node is alive and set new primary node:-1

Tue Nov 6 17:08:03 JST 2018

Hi,

On Tue, 30 Oct 2018 10:06:04 +0800
"dw_qiuchunxiao at sina.com" <dw_qiuchunxiao at sina.com> wrote:

> Hi,
>     I have increased the value of the parameter, but it doesn't solve the problem.
> 
> health_check_max_retries = 3
>                                    # Maximum number of times to retry a failed health check before giving up.
> health_check_retry_delay = 3
>                                    # Amount of time to wait (in seconds) between retries.
> connect_timeout = 10000

Could you upload the following files? Maybe something wrong in config files.

- pgpool.conf
- pool_hba.conf
- failover.sh 
- pool_passwd

> dw_qiuchunxiao at sina.com
>  
> 发件人： Bo Peng
> 发送时间： 2018-10-29 13:46
> 收件人： dw_qiuchunxiao
> 抄送： pgpool-general
> 主题： Re:回复：Re: [pgpool-general: 6254] failiover fails sometimes with failover :falling node is alive and set new primary node:-1
>  
> Hi,
>  
> Sometimes due to the network problem health check fails after failover.
> Try to increase the number of times to retry a failed health check.
>  
> For example:
>  
> health_check_max_retries = 5
>  
> On Mon, 29 Oct 2018 08:59:34 +0800
> "mandy" <dw_qiuchunxiao at sina.com> wrote:
>  
> > 
> > Hi，I am sorry to reply to your email so late.Of course I can share it with you.
> > # ----------------------------# pgPool-II configuration file# ----------------------------## This file consists of lines of the form:##   name = value## Whitespace may be used.  Comments are introduced with "#" anywhere on a line.# The complete list of parameter names and allowed values can be found in the# pgPool-II documentation.## This file is read on server startup and when the server receives a SIGHUP# signal.  If you edit the file on a running system, you have to SIGHUP the# server for the changes to take effect, or use "pgpool reload".  Some# parameters, which are marked below, require a server shutdown and restart to# take effect.#
> > 
> > #------------------------------------------------------------------------------# CONNECTIONS#------------------------------------------------------------------------------
> > # - pgpool Connection Settings -
> > listen_addresses = '*'                                   # Host name or IP address to listen on:                                   # '*' for all, '' for no TCP/IP connections                                   # (change requires restart)port = 9999                                   # Port number                                   # (change requires restart)socket_dir = '/tmp'                                   # Unix domain socket path                                   # The Debian package defaults to                                   # /var/run/postgresql                                   # (change requires restart)listen_backlog_multiplier = 2                                   # Set the backlog parameter of listen(2) to    # num_init_children * listen_backlog_multiplier.                                   # (change requires restart)serialize_accept = off                                   # whether to serialize accept() call to avoid thundering herd problem             
>                        # (change requires restart)
> > # - pgpool Communication Manager Connection Settings -
> > pcp_listen_addresses = '*'                                   # Host name or IP address for pcp process to listen on:                                   # '*' for all, '' for no TCP/IP connections                                   # (change requires restart)pcp_port = 9898                                   # Port number for pcp                                   # (change requires restart)pcp_socket_dir = '/tmp'                                   # Unix domain socket path for pcp                                   # The Debian package defaults to                                   # /var/run/postgresql                                   # (change requires restart)
> > # - Backend Connection Settings -
> > backend_hostname0 = 'pgsrv13'                                   # Host name or IP address to connect to for backend 0backend_port0 = 5432                                   # Port number for backend 0backend_weight0 = 1                                   # Weight for backend 0 (only in load balancing mode)backend_data_directory0 = '/pgdata'                                   # Data directory for backend 0backend_flag0 = 'ALLOW_TO_FAILOVER'                                   # Controls various backend behavior                                   # ALLOW_TO_FAILOVER, DISALLOW_TO_FAILOVER    # or ALWAYS_MASTERbackend_hostname1 = 'pgsrv14'backend_port1 = 5432backend_weight1 = 1backend_data_directory1 = '/pgdata'backend_flag1 = 'ALLOW_TO_FAILOVER'
> > # - Authentication -
> > enable_pool_hba = on                                   # Use pool_hba.conf for client authenticationpool_passwd = 'pool_passwd'                                   # File name of pool_passwd for md5 authentication.                                   # "" disables pool_passwd.                                   # (change requires restart)authentication_timeout = 60                                   # Delay in seconds to complete client authentication                                   # 0 means no timeout.
> > allow_clear_text_frontend_auth = off    # Allow Pgpool-II to use clear text password authentication    # with clients, when pool_passwd does not    # contain the user password
> > 
> > # - SSL Connections -
> > ssl = off                                   # Enable SSL support                                   # (change requires restart)#ssl_key = './server.key'                                   # Path to the SSL private key file                                   # (change requires restart)#ssl_cert = './server.cert'                                   # Path to the SSL public certificate file                                   # (change requires restart)#ssl_ca_cert = ''                                   # Path to a single PEM format file                                   # containing CA root certificate(s)                                   # (change requires restart)#ssl_ca_cert_dir = ''                                   # Directory containing CA root certificate(s)                                   # (change requires restart)
> > 
> > #------------------------------------------------------------------------------# POOLS#------------------------------------------------------------------------------
> > # - Concurrent session and pool size -
> > num_init_children = 32                                   # Number of concurrent sessions allowed                                   # (change requires restart)max_pool = 4                                   # Number of connection pool caches per connection                                   # (change requires restart)
> > # - Life time -
> > child_life_time = 300                                   # Pool exits after being idle for this many secondschild_max_connections = 0                                   # Pool exits after receiving that many connections                                   # 0 means no exitconnection_life_time = 0                                   # Connection to backend closes after being idle for this many seconds                                   # 0 means no closeclient_idle_limit = 0                                   # Client is disconnected after being idle for that many seconds                                   # (even inside an explicit transactions!)                                   # 0 means no disconnection
> > 
> > #------------------------------------------------------------------------------# LOGS#------------------------------------------------------------------------------
> > # - Where to log -
> > log_destination = 'stderr,syslog'                                   # Where to log                                   # Valid values are combinations of stderr,                                   # and syslog. Default to stderr.
> > # - What to log -
> > log_line_prefix = '%t: pid %p: '   # printf-style string to output at beginning of each log line.
> > log_connections = on                                   # Log connectionslog_hostname = on                                   # Hostname will be shown in ps status                                   # and in logs if connections are loggedlog_statement = on                                   # Log all statementslog_per_node_statement = off                                   # Log all statements                                   # with node and backend informationslog_client_messages = off                                   # Log any client messageslog_standby_delay = 'none'                                   # Log standby delay                                   # Valid values are combinations of always,                                   # if_over_threshold, none
> > # - Syslog specific -
> > syslog_facility = 'LOCAL0'                                   # Syslog local facility. Default to LOCAL0syslog_ident = 'pgpool'                                   # Syslog program identification string                                   # Default to 'pgpool'
> > # - Debug -
> > #log_error_verbosity = default          # terse, default, or verbose messages
> > #client_min_messages = notice           # values in order of decreasing detail:                                        #   debug5                                        #   debug4                                        #   debug3                                        #   debug2                                        #   debug1                                        #   log                                        #   notice                                        #   warning                                        #   error
> > #log_min_messages = warning             # values in order of decreasing detail:                                        #   debug5                                        #   debug4                                        #   debug3                                        #   debug2                                        #   debug1                                        #   info                                        #   notice                                        #   warning                                        #   error                                        #   log                                        #   fatal                                        #   panic
> > #------------------------------------------------------------------------------# FILE LOCATIONS#------------------------------------------------------------------------------
> > pid_file_name = '/usr/local/pgpool-4.0.0/pgpool.pid'                                   # PID file name                                   # Can be specified as relative to the"                                   # location of pgpool.conf file or                                   # as an absolute path                                   # (change requires restart)logdir = '/var/log/pgpool'                                   # Directory of pgPool status file                                   # (change requires restart)
> > 
> > #------------------------------------------------------------------------------# CONNECTION POOLING#------------------------------------------------------------------------------
> > connection_cache = on                                   # Activate connection pools                                   # (change requires restart)
> >                                    # Semicolon separated list of queries                                   # to be issued at the end of a session                                   # The default is for 8.3 and laterreset_query_list = 'ABORT; DISCARD ALL'                                   # The following one is for 8.2 and before#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
> > 
> > #------------------------------------------------------------------------------# REPLICATION MODE#------------------------------------------------------------------------------
> > replication_mode = off                                   # Activate replication mode                                   # (change requires restart)replicate_select = off                                   # Replicate SELECT statements                                   # when in replication mode                                   # replicate_select is higher priority than                                   # load_balance_mode.
> > insert_lock = on                                   # Automatically locks a dummy row or a table                                   # with INSERT statements to keep SERIAL data                                   # consistency                                   # Without SERIAL, no lock will be issuedlobj_lock_table = ''                                   # When rewriting lo_creat command in                                   # replication mode, specify table name to                                   # lock
> > # - Degenerate handling -
> > replication_stop_on_mismatch = off                                   # On disagreement with the packet kind                                   # sent from backend, degenerate the node                                   # which is most likely "minority"                                   # If off, just force to exit this session
> > failover_if_affected_tuples_mismatch = off                                   # On disagreement with the number of affected                                   # tuples in UPDATE/DELETE queries, then                                   # degenerate the node which is most likely                                   # "minority".                                   # If off, just abort the transaction to                                   # keep the consistency
> > 
> > #------------------------------------------------------------------------------# LOAD BALANCING MODE#------------------------------------------------------------------------------
> > load_balance_mode = on                                   # Activate load balancing mode                                   # (change requires restart)ignore_leading_white_space = on                                   # Ignore leading white spaces of each querywhite_function_list = ''                                   # Comma separated list of function names                                   # that don't write to database                                   # Regexp are acceptedblack_function_list = 'nextval,setval,nextval,setval'                                   # Comma separated list of function names                                   # that write to database                                   # Regexp are accepted
> > black_query_pattern_list = ''                                   # Semicolon separated list of query patterns                                   # that should be sent to primary node                                   # Regexp are accepted    # valid for streaming replicaton mode only.
> > database_redirect_preference_list = ''    # comma separated list of pairs of database and node id.    # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2'    # valid for streaming replicaton mode only.app_name_redirect_preference_list = ''    # comma separated list of pairs of app name and node id.    # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby'    # valid for streaming replicaton mode only.allow_sql_comments = off    # if on, ignore SQL comments when judging if load balance or    # query cache is possible.    # If off, SQL comments effectively prevent the judgment    # (pre 3.4 behavior).
> > disable_load_balance_on_write = 'transaction' # Load balance behavior when write query is issued # in an explicit transaction. # Note that any query not in an explicit transaction # is not affected by the parameter. # 'transaction' (the default): if a write query is issued, # subsequent read queries will not be load balanced # until the transaction ends. # 'trans_transaction': if a write query is issued, # subsequent read queries in an explicit transaction # will not be load balanced until the session ends. # 'always': if a write query is issued, read queries will # not be load balanced until the session ends.
> > #------------------------------------------------------------------------------# MASTER/SLAVE MODE#------------------------------------------------------------------------------
> > master_slave_mode = on                                   # Activate master/slave mode                                   # (change requires restart)master_slave_sub_mode = 'stream'                                   # Master/slave sub mode                                   # Valid values are combinations stream, slony                                   # or logical. Default is stream.                                   # (change requires restart)
> > # - Streaming -
> > sr_check_period = 5                                   # Streaming replication check period                                   # Disabled (0) by defaultsr_check_user = 'postgres'                                   # Streaming replication check user                                   # This is necessary even if you disable                                   # streaming replication delay check with                                   # sr_check_period = 0
> > sr_check_password = 'postgres'    # Password for streaming replication check user.    # Leaving it empty will make Pgpool-II to first look for the    # Password in pool_passwd file before using the empty password
> > sr_check_database = 'postgres'                                   # Database name for streaming replication checkdelay_threshold = 1000                                   # Threshold before not dispatching query to standby node                                   # Unit is in bytes                                   # Disabled (0) by default
> > # - Special commands -
> > follow_master_command = ''                                   # Executes this command after master failover                                   # Special values:                                   #   %d = node id                                   #   %h = host name                                   #   %p = port number                                   #   %D = database cluster path                                   #   %m = new master node id                                   #   %H = hostname of the new master node                                   #   %M = old master node id                                   #   %P = old primary node id                                   #   %r = new master port number                                   #   %R = new master database cluster path                                   #   %% = '%' character
> > #------------------------------------------------------------------------------# HEALTH CHECK GLOBAL PARAMETERS#------------------------------------------------------------------------------
> > health_check_period = 5                                   # Health check period                                   # Disabled (0) by defaulthealth_check_timeout = 20                                   # Health check timeout                                   # 0 means no timeouthealth_check_user = 'postgres'                                   # Health check userhealth_check_password = 'postgres'                                   # Password for health check user                                   # Leaving it empty will make Pgpool-II to first look for the                                   # Password in pool_passwd file before using the empty password
> > health_check_database = 'postgres'                                   # Database name for health check. If '', tries 'postgres' frist, then 'template1'
> > health_check_max_retries = 0                                   # Maximum number of times to retry a failed health check before giving up.health_check_retry_delay = 1                                   # Amount of time to wait (in seconds) between retries.connect_timeout = 10000                                   # Timeout value in milliseconds before giving up to connect to backend.    # Default is 10000 ms (10 second). Flaky network user may want to increase    # the value. 0 means no timeout.    # Note that this value is not only used for health check,    # but also for ordinary conection to backend.
> > #------------------------------------------------------------------------------# HEALTH CHECK PER NODE PARAMETERS (OPTIONAL)#------------------------------------------------------------------------------#health_check_period0 = 0#health_check_timeout0 = 20#health_check_user0 = 'nobody'#health_check_password0 = ''#health_check_database0 = ''#health_check_max_retries0 = 0#health_check_retry_delay0 = 1#connect_timeout0 = 10000
> > #------------------------------------------------------------------------------# FAILOVER AND FAILBACK#------------------------------------------------------------------------------
> > failover_command = '/usr/local/pgpool-4.0.0/replscript/failover.sh %d %h %P %H'                                   # Executes this command at failover                                   # Special values:                                   #   %d = node id                                   #   %h = host name                                   #   %p = port number                                   #   %D = database cluster path                                   #   %m = new master node id                                   #   %H = hostname of the new master node                                   #   %M = old master node id                                   #   %P = old primary node id                                   #   %r = new master port number                                   #   %R = new master database cluster path                                   #   %% = '%' characterfailback_command = ''                                   # Executes this command at failback.        

>                           # Special values:                                   #   %d = node id                                   #   %h = host name                                   #   %p = port number                                   #   %D = database cluster path                                   #   %m = new master node id                                   #   %H = hostname of the new master node                                   #   %M = old master node id                                   #   %P = old primary node id                                   #   %r = new master port number                                   #   %R = new master database cluster path                                   #   %% = '%' character
> > failover_on_backend_error = on                                   # Initiates failover when reading/writing to the                                   # backend communication socket fails                                   # If set to off, pgpool will report an                                   # error and disconnect the session.
> > detach_false_primary = off                                   # Detach false primary if on. Only                                   # valid in streaming replicaton                                   # mode and with PostgreSQL 9.6 or                                   # after.
> > search_primary_node_timeout = 300                                   # Timeout in seconds to search for the                                   # primary node when a failover occurs.                                   # 0 means no timeout, keep searching                                   # for a primary node forever.
> > #------------------------------------------------------------------------------# ONLINE RECOVERY#------------------------------------------------------------------------------
> > recovery_user = 'postgres'                                   # Online recovery userrecovery_password = 'postgres'                                   # Online recovery password                                   # Leaving it empty will make Pgpool-II to first look for the                                   # Password in pool_passwd file before using the empty password
> > recovery_1st_stage_command = 'recovery_1st_stage.sh'                                   # Executes a command in first stagerecovery_2nd_stage_command = ''                                   # Executes a command in second stagerecovery_timeout = 90                                   # Timeout in seconds to wait for the                                   # recovering node's postmaster to start up                                   # 0 means no waitclient_idle_limit_in_recovery = 0                                   # Client is disconnected after being idle                                   # for that many seconds in the second stage                                   # of online recovery                                   # 0 means no disconnection                                   # -1 means immediate disconnection
> > 
> > #------------------------------------------------------------------------------# WATCHDOG#------------------------------------------------------------------------------
> > # - Enabling -
> > use_watchdog = off                                    # Activates watchdog                                    # (change requires restart)
> > # -Connection to up stream servers -
> > trusted_servers = ''                                    # trusted server list which are used                                    # to confirm network connection                                    # (hostA,hostB,hostC,...)                                    # (change requires restart)ping_path = '/bin'                                    # ping command path                                    # (change requires restart)
> > # - Watchdog communication Settings -
> > wd_hostname = ''                                    # Host name or IP address of this watchdog                                    # (change requires restart)wd_port = 9000                                    # port number for watchdog service                                    # (change requires restart)wd_priority = 1 # priority of this watchdog in leader election # (change requires restart)
> > wd_authkey = ''                                    # Authentication key for watchdog communication                                    # (change requires restart)
> > wd_ipc_socket_dir = '/tmp' # Unix domain socket path for watchdog IPC socket # The Debian package defaults to # /var/run/postgresql # (change requires restart)
> > 
> > # - Virtual IP control Setting -
> > delegate_IP = ''                                    # delegate IP address                                    # If this is empty, virtual IP never bring up.                                    # (change requires restart)if_cmd_path = '/sbin'                                    # path to the directory where if_up/down_cmd exists                                     # (change requires restart)if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0'                                    # startup delegate IP command                                    # (change requires restart)if_down_cmd = 'ip addr del $_IP_$/24 dev eth0'                                    # shutdown delegate IP command                                    # (change requires restart)arping_path = '/usr/sbin'                                    # arping command path                                    # (change requires restart)arping_cmd = 'arping -U $_IP_$ -w 1'                                    # arping command   

>                                 # (change requires restart)
> > # - Behaivor on escalation Setting -
> > clear_memqcache_on_escalation = on                                    # Clear all the query cache on shared memory                                    # when standby pgpool escalate to active pgpool                                    # (= virtual IP holder).                                    # This should be off if client connects to pgpool                                    # not using virtual IP.                                    # (change requires restart)wd_escalation_command = ''                                    # Executes this command at escalation on new active pgpool.                                    # (change requires restart)wd_de_escalation_command = '' # Executes this command when master pgpool resigns from being master. # (change requires restart)
> > # - Watchdog consensus settings for failover -
> > failover_when_quorum_exists = on # Only perform backend node failover # when the watchdog cluster holds the quorum # (change requires restart)
> > failover_require_consensus = on # Perform failover when majority of Pgpool-II nodes # aggrees on the backend node status change # (change requires restart)
> > allow_multiple_failover_requests_from_node = off # A Pgpool-II node can cast multiple votes # for building the consensus on failover # (change requires restart)
> > # - Lifecheck Setting -
> > # -- common --
> > wd_monitoring_interfaces_list = ''  # Comma separated list of interfaces names to monitor. # if any interface from the list is active the watchdog will # consider the network is fine # 'any' to enable monitoring on all interfaces except loopback # '' to disable monitoring # (change requires restart)
> > 
> > wd_lifecheck_method = 'heartbeat'                                    # Method of watchdog lifecheck ('heartbeat' or 'query' or 'external')                                    # (change requires restart)wd_interval = 10                                    # lifecheck interval (sec) > 0                                    # (change requires restart)
> > # -- heartbeat mode --
> > wd_heartbeat_port = 9694                                    # Port number for receiving heartbeat signal                                    # (change requires restart)wd_heartbeat_keepalive = 2                                    # Interval time of sending heartbeat signal (sec)                                    # (change requires restart)wd_heartbeat_deadtime = 30                                    # Deadtime interval for heartbeat signal (sec)                                    # (change requires restart)heartbeat_destination0 = 'host0_ip1'                                    # Host name or IP address of destination 0                                    # for sending heartbeat signal.                                    # (change requires restart)heartbeat_destination_port0 = 9694                                     # Port number of destination 0 for sending                                    # heartbeat signal. Usually this is the                                    # same
  a
> s wd_heartbeat_port.                                    # (change requires restart)heartbeat_device0 = ''                                    # Name of NIC device (such like 'eth0')                                    # used for sending/receiving heartbeat                                    # signal to/from destination 0.                                    # This works only when this is not empty                                    # and pgpool has root privilege.                                    # (change requires restart)
> > #heartbeat_destination1 = 'host0_ip2'#heartbeat_destination_port1 = 9694#heartbeat_device1 = ''
> > # -- query mode --
> > wd_life_point = 3                                    # lifecheck retry times                                    # (change requires restart)wd_lifecheck_query = 'SELECT 1'                                    # lifecheck query to pgpool from watchdog                                    # (change requires restart)wd_lifecheck_dbname = 'template1'                                    # Database name connected for lifecheck                                    # (change requires restart)wd_lifecheck_user = 'nobody'                                    # watchdog user monitoring pgpools in lifecheck                                    # (change requires restart)wd_lifecheck_password = ''                                    # Password for watchdog user in lifecheck # Leaving it empty will make Pgpool-II to first look for the # Password in pool_passwd file before using the empty password                                    # (change requires restart)
> > # - Other pgpool Connection Settings -
> > #other_pgpool_hostname0 = 'host0'                                    # Host name or IP address to connect to for other pgpool 0                                    # (change requires restart)#other_pgpool_port0 = 5432                                    # Port number for other pgpool 0                                    # (change requires restart)#other_wd_port0 = 9000                                    # Port number for other watchdog 0                                    # (change requires restart)#other_pgpool_hostname1 = 'host1'#other_pgpool_port1 = 5432#other_wd_port1 = 9000
> > 
> > #------------------------------------------------------------------------------# OTHERS#------------------------------------------------------------------------------relcache_expire = 0                                   # Life time of relation cache in seconds.                                   # 0 means no cache expiration(the default).                                   # The relation cache is used for cache the                                   # query result against PostgreSQL system                                   # catalog to obtain various information                                   # including table structures or if it's a                                   # temporary table or not. The cache is                                   # maintained in a pgpool child local memory                                   # and being kept as long as it survives.                                   # If someone modify the table by using                                   # ALTER TAB
 LE
>   or some such, the relcache is                                   # not consistent anymore.                                   # For this purpose, cache_expiration                                   # controls the life time of the cache.
> > relcache_size = 256                                   # Number of relation cache                                   # entry. If you see frequently:                                   # "pool_search_relcache: cache replacement happend"                                   # in the pgpool log, you might want to increate this number.
> > check_temp_table = on                                   # If on, enable temporary table check in SELECT statements.                                   # This initiates queries against system catalog of primary/master                                   # thus increases load of master.                                   # If you are absolutely sure that your system never uses temporary tables                                   # and you want to save access to primary/master, you could turn this off.                                   # Default is on.
> > check_unlogged_table = on                                   # If on, enable unlogged table check in SELECT statements.                                   # This initiates queries against system catalog of primary/master                                   # thus increases load of master.                                   # If you are absolutely sure that your system never uses unlogged tables                                   # and you want to save access to primary/master, you could turn this off.                                   # Default is on.
> > #------------------------------------------------------------------------------# IN MEMORY QUERY MEMORY CACHE#------------------------------------------------------------------------------memory_cache_enabled = off    # If on, use the memory cache functionality, off by defaultmemqcache_method = 'shmem'    # Cache storage method. either 'shmem'(shared memory) or    # 'memcached'. 'shmem' by default                                   # (change requires restart)memqcache_memcached_host = 'localhost'    # Memcached host name or IP address. Mandatory if    # memqcache_method = 'memcached'.    # Defaults to localhost.                                   # (change requires restart)memqcache_memcached_port = 11211    # Memcached port number. Mondatory if memqcache_method = 'memcached'.    # Defaults to 11211.                                   # (change requires restart)memqcache_total_size = 67108864    # Total memory size
>   in bytes for storing memory cache.    # Mandatory if memqcache_method = 'shmem'.    # Defaults to 64MB.                                   # (change requires restart)memqcache_max_num_cache = 1000000    # Total number of cache entries. Mandatory    # if memqcache_method = 'shmem'.    # Each cache entry consumes 48 bytes on shared memory.    # Defaults to 1,000,000(45.8MB).                                   # (change requires restart)memqcache_expire = 0    # Memory cache entry life time specified in seconds.    # 0 means infinite life time. 0 by default.                                   # (change requires restart)memqcache_auto_cache_invalidation = on    # If on, invalidation of query cache is triggered by corresponding    # DDL/DML/DCL(and memqcache_expire).  If off, it is only triggered    # by memqcache_expire.  on by default.                                   # (change requires restart)memqcach
> e_maxcache = 409600    # Maximum SELECT result size in bytes.    # Must be smaller than memqcache_cache_block_size. Defaults to 400KB.                                   # (change requires restart)memqcache_cache_block_size = 1048576    # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.    # Defaults to 1MB.                                   # (change requires restart)memqcache_oiddir = '/var/log/pgpool/oiddir'       # Temporary work directory to record table oids                                   # (change requires restart)white_memqcache_table_list = ''                                   # Comma separated list of table names to memcache                                   # that don't write to database                                   # Regexp are acceptedblack_memqcache_table_list = ''                                   # Comma separated list of table names not to memcache                                   # that don't wri
> te to database                                   # Regexp are accepted
> > Thank you !
> > 
> > 
> > --------------------------------
> > 
> > 
> > ----- 原始邮件 -----
> > 发件人：Bo Peng <pengbo at sraoss.co.jp>
> > 收件人：dw_qiuchunxiao at sina.com
> > 抄送人："pgpool-general" <pgpool-general at pgpool.net>
> > 主题：Re: [pgpool-general: 6254] failiover fails sometimes with failover :falling node is alive and set new primary node:-1
> > 日期：2018年10月27日 22点27分
> > 
> > 
> > Hi
> > It seems that after the first failover, pgpool could not 
> > make connection to the new primary node.
> > Could you share the pgpool.conf?
> > On Fri, 26 Oct 2018 18:19:17 +0800
> > "mandy" <dw_qiuchunxiao at sina.com> wrote:
> > > Hi,     Sometime I  fail to activate the standby node  as a new primary node when the old primary node is down  by using the failover of pgpool-II.     I have 2 severs with centOS7,one for pgpool-II-4.0.0 and  postgresql-11.0, the other for postgresql-11.0. I installed pgpool-II and postgresql by using sources.
> > >     As the following show,I have 2 nodes, pgsrv14 is the primary node and pgsrv13 is the standby node, for using streaming replication. And the pgpool-II is installed in pgsrv13 server.     [postgres at pgsrv13 replscript]$ psql -p 9999psql (11.0)Type "help" for help.
> > > postgres=# show pool_nodes; node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change  ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+--------------------- 0       | pgsrv13  | 5432 | up     | 0.500000  | standby | 0          | true              | 0                 | 2018-10-26 07:42:46 1       | pgsrv14  | 5432 | up     | 0.500000  | primary | 0          | false             | 0                 | 2018-10-26 07:41:15(2 rows)           When I use the command "pg_ctl stop" to  stop the database in pgsrv14，it should have activated the pgsrv13(the standby node) as a new primary node by executing failover.sh script.  However, it failed.      I have specify 4 special characters in failover_command  in pgpool.conf.      failover_command = '/usr/local/pgpool-4.0.0/replscript/failover.sh %d %h %P %H'                                   # Execu
> >  tes this command at failover                                   # Special values:                                   #   %d = node id                                   #   %h = host name                                   #   %H = hostname of the new master node                                   #   %P = old primary node id
> > > and the log of failover.sh said,.......failover.sh FALLING_NODE: 1; FALLING_HOST: pgsrv14; OLDPRIMARY_NODE: 1; NEW_PRIMARY: pgsrv13; at Fri Oct 26 07:43:06 EDT 2018 ssh -f -n -T postgres at pgsrv13 /usr/local/pgpool-4.0.0/replscript/promote.sh -d pgsrv14failover done! 
> > > failover.sh FALLING_NODE: 0; FALLING_HOST: pgsrv13; OLDPRIMARY_NODE: 0; NEW_PRIMARY: ; at Fri Oct 26 07:43:11 EDT 2018 ssh -f -n -T postgres@ /usr/local/pgpool-4.0.0/replscript/promote.sh -d pgsrv13failover done! .......
> > > the first paragraph in the log of failover.sh is proper, it correctly realize the falling node  and old primary node is 1(pgsrv14) ,and new primary node is pgsrv13. And the result of command "select pg_is_in_recovery()" in pgsrv13 is 'f', which show that pgsrv13 is the primary node now and is alive.
> > > However, because I can not execute the recovery command to recovery pgsrv14  to a normal node in such a short period of time which is  5 seconds, the pgpool execute a second failover command,as the second paragreph show.But pgpool realize the falling node is pgsrv13 (pgsrv 13 is up and primary node actually)
> > > when the second paragraph happened, pgpool.log said pgsrv13 was shutdown by adminstrative command(actually it was alive), and all db nodes are in down status,and  pgpool set new primary node :-1, maybe this is the reason why the new_primary is null in the second paragraph.
> > > that pgpool.log said,Oct 26 07:43:11 pgsrv13 pgpool[3827]: [438-1] 2018-10-26 07:43:11: pid 3827: LOG:  reading and processing packetsOct 26 07:43:11 pgsrv13 pgpool[3827]: [438-2] 2018-10-26 07:43:11: pid 3827: DETAIL:  postmaster on DB node 0 was shutdown by administrative commandOct 26 07:43:11 pgsrv13 pgpool[3827]: [439-1] 2018-10-26 07:43:11: pid 3827: LOG:  received degenerate backend request for node_id: 0 from pid [3827]Oct 26 07:43:11 pgsrv13 pgpool[3036]: [467-1] 2018-10-26 07:43:11: pid 3036: LOG:  Pgpool-II parent process has received failover requestOct 26 07:43:11 pgsrv13 pgpool[3036]: [468-1] 2018-10-26 07:43:11: pid 3036: LOG:  starting degeneration. shutdown host pgsrv13(5432)Oct 26 07:43:11 pgsrv13 pgpool[3036]: [469-1] 2018-10-26 07:43:11: pid 3036: WARNING:  All the DB nodes are in down status and skip writing status file.Oct 26 07:43:11 pgsrv13 pgpool[3036]: [470-1] 2018-10-26 07:43:11: pid 3036: LOG:  failover: no valid backend node foundOct 26 07:4
 3:
> 11
> >   pgsrv13 pgpool[3036]: [471-1] 2018-10-26 07:43:11: pid 3036: LOG:  Restart all childrenOct 26 07:43:11 pgsrv13 pgpool[3036]: [472-1] 2018-10-26 07:43:11: pid 3036: LOG:  execute command: /usr/local/pgpool-4.0.0/replscript/failover.sh 0 pgsrv13 0 ""Oct 26 07:43:11 pgsrv13 pgpool[3036]: [473-1] 2018-10-26 07:43:11: pid 3036: LOG:  find_primary_node_repeatedly: waiting for finding a primary nodeOct 26 07:48:11 pgsrv13 pgpool[3036]: [474-1] 2018-10-26 07:48:11: pid 3036: LOG:  failover: set new primary node: -1Oct 26 07:48:11 pgsrv13 pgpool[4029]: [475-1] 2018-10-26 07:48:11: pid 4029: LOG:  failback event detectedOct 26 07:48:11 pgsrv13 pgpool[4029]: [475-2] 2018-10-26 07:48:11: pid 4029: DETAIL:  restarting myself
> > > I don't know what 's wrong! Any help is welcome, and I am glad to offer more information if it's helpful to solve the problem.Thank you!
> > -- 
> > Bo Peng <pengbo at sraoss.co.jp>
> > SRA OSS, Inc. Japan
>  
>  
> -- 
> Bo Peng <pengbo at sraoss.co.jp>
> SRA OSS, Inc. Japan
>  
>  
>  

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan