0000742: Long time failover - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000742	Pgpool-II	General	public	2022-01-18 21:21	2022-02-22 15:17

Reporter	Ken	Assigned To	pengbo
Priority	normal	Severity	major	Reproducibility	always
Status	closed	Resolution	open
Product Version	4.2.7
Target Version	4.2.8	Fixed in Version	4.2.8

Summary	0000742: Long time failover
Description	Hello I use pgpool-II in version 4.2.7, I noticed long time failover, whole failover from detect crash to availability database by VIP address is approximately 2minutes and 20 seconds. This time seem long. I noticed long time detect new primary node - time 1 minute and 31 seconds Why long time? Whether can I decrease time to switch failover? In attachement log of failover and configuration Best regards
Tags	No tags attached.

Ken 2022-01-18 21:21 reporter	log_failover.txt (12,277 bytes) 2022-01-18 12:47:42: pid 71350: LOG: new IPC connection received 2022-01-18 12:47:52: pid 71350: LOG: new IPC connection received 2022-01-18 12:48:02: pid 71350: LOG: new IPC connection received 2022-01-18 12:48:12: pid 71350: LOG: new IPC connection received 2022-01-18 12:48:17: pid 71412: LOG: failed to connect to PostgreSQL server on "192.168.5.109:5432", timed out 2022-01-18 12:48:17: pid 71412: ERROR: failed to make persistent db connection 2022-01-18 12:48:17: pid 71412: DETAIL: connection to host:"192.168.5.109:5432" failed 2022-01-18 12:48:17: pid 71412: LOG: health check retrying on DB node: 0 (round:1) 2022-01-18 12:48:22: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:48:22: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:48:23: pid 71350: LOG: remote node "192.168.5.109:5433 Linux p1.novalocal" is not replying to our beacons 2022-01-18 12:48:23: pid 71350: DETAIL: missed beacon reply count:2 2022-01-18 12:48:28: pid 71412: LOG: failed to connect to PostgreSQL server on "192.168.5.109:5432", timed out 2022-01-18 12:48:28: pid 71412: ERROR: failed to make persistent db connection 2022-01-18 12:48:28: pid 71412: DETAIL: connection to host:"192.168.5.109:5432" failed 2022-01-18 12:48:28: pid 71412: LOG: health check retrying on DB node: 0 (round:2) 2022-01-18 12:48:32: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:48:32: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:48:33: pid 71350: LOG: remote node "192.168.5.109:5433 Linux p1.novalocal" is not replying to our beacons 2022-01-18 12:48:33: pid 71350: DETAIL: missed beacon reply count:3 2022-01-18 12:48:39: pid 71412: LOG: failed to connect to PostgreSQL server on "192.168.5.109:5432", timed out 2022-01-18 12:48:39: pid 71412: ERROR: failed to make persistent db connection 2022-01-18 12:48:39: pid 71412: DETAIL: connection to host:"192.168.5.109:5432" failed 2022-01-18 12:48:39: pid 71412: LOG: health check retrying on DB node: 0 (round:3) 2022-01-18 12:48:41: pid 71372: LOG: informing the node status change to watchdog 2022-01-18 12:48:41: pid 71372: DETAIL: node id :0 status = "NODE DEAD" message:"No heartbeat signal from node" 2022-01-18 12:48:41: pid 71350: LOG: new IPC connection received 2022-01-18 12:48:41: pid 71350: LOG: received node status change ipc message 2022-01-18 12:48:41: pid 71350: DETAIL: No heartbeat signal from node 2022-01-18 12:48:41: pid 71350: LOG: remote node "192.168.5.109:5433 Linux p1.novalocal" is lost 2022-01-18 12:48:41: pid 71350: LOG: removing watchdog node "192.168.5.109:5433 Linux p1.novalocal" from the standby list 2022-01-18 12:48:42: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:48:42: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:48:44: pid 71404: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:48:44: pid 71404: DETAIL: timed out. retrying... 2022-01-18 12:48:50: pid 71412: LOG: failed to connect to PostgreSQL server on "192.168.5.109:5432", timed out 2022-01-18 12:48:50: pid 71412: ERROR: failed to make persistent db connection 2022-01-18 12:48:50: pid 71412: DETAIL: connection to host:"192.168.5.109:5432" failed 2022-01-18 12:48:50: pid 71412: LOG: health check failed on node 0 (timeout:0) 2022-01-18 12:48:50: pid 71412: LOG: received degenerate backend request for node_id: 0 from pid [71412] 2022-01-18 12:48:50: pid 71350: LOG: new IPC connection received 2022-01-18 12:48:50: pid 71350: LOG: watchdog received the failover command from local pgpool-II on IPC interface 2022-01-18 12:48:50: pid 71350: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface 2022-01-18 12:48:50: pid 71350: LOG: failover requires the majority vote, waiting for consensus 2022-01-18 12:48:50: pid 71350: DETAIL: failover request noted 2022-01-18 12:48:50: pid 71350: LOG: failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II node "192.168.5.72:5433 Linux p2.novalocal" is queued, waiting for the confirmation from other nodes 2022-01-18 12:48:50: pid 71412: LOG: degenerate backend request for node_id: 0 from pid [71412], will be handled by watchdog, which is building consensus for request 2022-01-18 12:48:51: pid 71350: LOG: watchdog received the failover command from remote pgpool-II node "192.168.5.243:5433 Linux w1.novalocal" 2022-01-18 12:48:51: pid 71350: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from 192.168.5.243:5433 Linux w1.novalocal 2022-01-18 12:48:51: pid 71350: LOG: we have got the consensus to perform the failover 2022-01-18 12:48:51: pid 71350: DETAIL: 2 node(s) voted in the favor 2022-01-18 12:48:51: pid 71350: LOG: received degenerate backend request for node_id: 0 from pid [71350] 2022-01-18 12:48:51: pid 71350: LOG: signal_user1_to_parent_with_reason(0) 2022-01-18 12:48:51: pid 71331: LOG: Pgpool-II parent process received SIGUSR1 2022-01-18 12:48:51: pid 71331: LOG: Pgpool-II parent process has received failover request 2022-01-18 12:48:51: pid 71350: LOG: new IPC connection received 2022-01-18 12:48:51: pid 71350: LOG: received the failover indication from Pgpool-II on IPC interface 2022-01-18 12:48:51: pid 71350: LOG: watchdog is informed of failover start by the main process 2022-01-18 12:48:51: pid 71331: LOG: starting degeneration. shutdown host 192.168.5.109(5432) 2022-01-18 12:48:51: pid 71331: LOG: Restart all children 2022-01-18 12:48:51: pid 71331: LOG: execute command: /etc/pgpool-II/failover.sh 0 192.168.5.109 5432 /opt/pgsql/data/db 1 192.168.5.72 0 0 5432 /opt/pgsql/data/db 192.168.5.109 5432 + FAILED_NODE_ID=0 + FAILED_NODE_HOST=192.168.5.109 + FAILED_NODE_PORT=5432 + FAILED_NODE_PGDATA=/opt/pgsql/data/db + NEW_MAIN_NODE_ID=1 + NEW_MAIN_NODE_HOST=192.168.5.72 + OLD_MAIN_NODE_ID=0 + OLD_PRIMARY_NODE_ID=0 + NEW_MAIN_NODE_PORT=5432 + NEW_MAIN_NODE_PGDATA=/opt/pgsql/data/db + OLD_PRIMARY_NODE_HOST=192.168.5.109 + OLD_PRIMARY_NODE_PORT=5432 + PGHOME=/usr/pgsql-11 + echo failover.sh: start: failed_node_id=0 old_primary_node_id=0 failed_host=192.168.5.109 new_main_host=192.168.5.72 failover.sh: start: failed_node_id=0 old_primary_node_id=0 failed_host=192.168.5.109 new_main_host=192.168.5.72 + '[' 1 -lt 0 ']' + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@192.168.5.72 -i /opt/pgsql/.ssh/id_rsa_pgpool ls /tmp Warning: Permanently added '192.168.5.72' (ECDSA) to the list of known hosts. Authorized uses only. All activity may be monitored and reported + '[' 0 -ne 0 ']' + '[' 0 -ne 0 ']' + echo failover.sh: Primary node is down, promote standby node 192.168.5.72. failover.sh: Primary node is down, promote standby node 192.168.5.72. + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@192.168.5.72 -i /opt/pgsql/.ssh/id_rsa_pgpool /usr/pgsql-11/bin/pg_ctl -D /opt/pgsql/data/db -w promote Warning: Permanently added '192.168.5.72' (ECDSA) to the list of known hosts. Authorized uses only. All activity may be monitored and reported waiting for server to promote.... done server promoted + '[' 0 -ne 0 ']' + echo failover.sh: end: new_main_node_id=1 is promoted to a primary failover.sh: end: new_main_node_id=1 is promoted to a primary + exit 0 2022-01-18 12:48:52: pid 71331: LOG: find_primary_node_repeatedly: waiting for finding a primary node 2022-01-18 12:48:52: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:48:52: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:49:02: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:49:02: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:49:12: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:49:12: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:49:22: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:49:22: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:49:32: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:49:32: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:49:39: pid 71350: LOG: watchdog received the failover command from remote pgpool-II node "192.168.5.243:5433 Linux w1.novalocal" 2022-01-18 12:49:39: pid 71350: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from 192.168.5.243:5433 Linux w1.novalocal 2022-01-18 12:49:39: pid 71350: LOG: failover requires the majority vote, waiting for consensus 2022-01-18 12:49:39: pid 71350: DETAIL: failover request noted 2022-01-18 12:49:39: pid 71350: LOG: failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II node "192.168.5.243:5433 Linux w1.novalocal" is queued, waiting for the confirmation from other nodes 2022-01-18 12:49:39: pid 71350: LOG: signal_user1_to_parent_with_reason(4) 2022-01-18 12:49:42: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:49:42: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:49:52: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:49:52: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:50:02: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:50:02: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:50:12: pid 71411: LOG: trying connecting to PostgreSQL server on "192.168.5.109:5432" by INET socket 2022-01-18 12:50:12: pid 71411: DETAIL: timed out. retrying... 2022-01-18 12:50:22: pid 71411: LOG: failed to connect to PostgreSQL server on "192.168.5.109:5432", getsockopt() failed 2022-01-18 12:50:22: pid 71411: DETAIL: Operation already in progress 2022-01-18 12:50:22: pid 71411: ERROR: failed to make persistent db connection 2022-01-18 12:50:22: pid 71411: DETAIL: connection to host:"192.168.5.109:5432" failed 2022-01-18 12:50:23: pid 71331: LOG: find_primary_node: primary node is 1 2022-01-18 12:50:23: pid 71331: LOG: failover: set new primary node: 1 2022-01-18 12:50:23: pid 71331: LOG: failover: set new main node: 1 2022-01-18 12:50:23: pid 71411: ERROR: Failed to check replication time lag 2022-01-18 12:50:23: pid 71411: DETAIL: No persistent db connection for the node 0 2022-01-18 12:50:23: pid 71411: HINT: check sr_check_user and sr_check_password 2022-01-18 12:50:23: pid 71411: CONTEXT: while checking replication time lag 2022-01-18 12:50:23: pid 71411: LOG: worker process received restart request 2022-01-18 12:50:23: pid 71350: LOG: new IPC connection received 2022-01-18 12:50:23: pid 71350: LOG: received the failover indication from Pgpool-II on IPC interface 2022-01-18 12:50:23: pid 71350: LOG: watchdog is informed of failover end by the main process failover done. shutdown host 192.168.5.109(5432)2022-01-18 12:50:23: pid 71331: LOG: failover done. shutdown host 192.168.5.109(5432) 2022-01-18 12:50:24: pid 71410: LOG: restart request received in pcp child process 2022-01-18 12:50:24: pid 71331: LOG: PCP child 71410 exits with status 0 in failover() 2022-01-18 12:50:24: pid 71331: LOG: fork a new PCP child pid 72123 in failover() 2022-01-18 12:50:24: pid 71331: LOG: Pgpool-II parent process received SIGUSR1 2022-01-18 12:50:24: pid 71331: LOG: Pgpool-II parent process received inform quarantine nodes signal from watchdog 2022-01-18 12:50:24: pid 71331: LOG: child process with pid: 71374 exits with status 256 2022-01-18 12:50:24: pid 71331: LOG: child process with pid: 71376 exits with status 256 2022-01-18 12:50:24: pid 71331: LOG: child process with pid: 71378 exits with status 256 2022-01-18 12:50:24: pid 71331: LOG: child process with pid: 71380 exits with status 256 log_failover.txt (12,277 bytes) pgpool.conf (47,048 bytes) # ---------------------------- # pgPool-II configuration file # ---------------------------- # # This file consists of lines of the form: # # name = value # # Whitespace may be used. Comments are introduced with "#" anywhere on a line. # The complete list of parameter names and allowed values can be found in the # pgPool-II documentation. # # This file is read on server startup and when the server receives a SIGHUP # signal. If you edit the file on a running system, you have to SIGHUP the # server for the changes to take effect, or use "pgpool reload". Some # parameters, which are marked below, require a server shutdown and restart to # take effect. # #------------------------------------------------------------------------------ # BACKEND CLUSTERING MODE # Choose one of: 'streaming_replication', 'native_replication', # 'logical_replication', 'slony', 'raw' or 'snapshot_isolation' # (change requires restart) #------------------------------------------------------------------------------ backend_clustering_mode = 'streaming_replication' #------------------------------------------------------------------------------ # CONNECTIONS #------------------------------------------------------------------------------ # - pgpool Connection Settings - listen_addresses = '' # Host name or IP address to listen on: # '' for all, '' for no TCP/IP connections # (change requires restart) port = 5433 # Port number # (change requires restart) socket_dir = '/var/run/postgresql' # Unix domain socket path # The Debian package defaults to # /var/run/postgresql # (change requires restart) reserved_connections = 0 # Number of reserved connections. # Pgpool-II does not accept connections if over # num_init_chidlren - reserved_connections. # - pgpool Communication Manager Connection Settings - pcp_listen_addresses = '' # Host name or IP address for pcp process to listen on: # '' for all, '' for no TCP/IP connections # (change requires restart) pcp_port = 9898 # Port number for pcp # (change requires restart) pcp_socket_dir = '/var/run/postgresql' # Unix domain socket path for pcp # The Debian package defaults to # /var/run/postgresql # (change requires restart) listen_backlog_multiplier = 2 # Set the backlog parameter of listen(2) to # num_init_children * listen_backlog_multiplier. # (change requires restart) serialize_accept = off # whether to serialize accept() call to avoid thundering herd problem # (change requires restart) # - Backend Connection Settings - backend_hostname0 = '192.168.5.109' # Host name or IP address to connect to for backend 0 backend_port0 = 5432 # Port number for backend 0 backend_weight0 = 1 # Weight for backend 0 (only in load balancing mode) backend_data_directory0 = '/opt/pgsql/data/db' # Data directory for backend 0 backend_flag0 = 'ALLOW_TO_FAILOVER' # Controls various backend behavior # ALLOW_TO_FAILOVER, DISALLOW_TO_FAILOVER # or ALWAYS_PRIMARY backend_application_name0 = 'server0' # walsender's application_name, used for "show pool_nodes" command backend_hostname1 = '192.168.5.72' backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = '/opt/pgsql/data/db' backend_flag1 = 'ALLOW_TO_FAILOVER' backend_application_name1 = 'server1' # - Authentication - enable_pool_hba = on # Use pool_hba.conf for client authentication pool_passwd = 'pool_passwd' # File name of pool_passwd for md5 authentication. # "" disables pool_passwd. # (change requires restart) authentication_timeout = 1min # Delay in seconds to complete client authentication # 0 means no timeout. allow_clear_text_frontend_auth = off # Allow Pgpool-II to use clear text password authentication # with clients, when pool_passwd does not # contain the user password # - SSL Connections - ssl = off # Enable SSL support # (change requires restart) #ssl_key = 'server.key' # SSL private key file # (change requires restart) #ssl_cert = 'server.crt' # SSL public certificate file # (change requires restart) #ssl_ca_cert = '' # Single PEM format file containing # CA root certificate(s) # (change requires restart) #ssl_ca_cert_dir = '' # Directory containing CA root certificate(s) # (change requires restart) #ssl_crl_file = '' # SSL certificate revocation list file # (change requires restart) ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # Allowed SSL ciphers # (change requires restart) ssl_prefer_server_ciphers = off # Use server's SSL cipher preferences, # rather than the client's # (change requires restart) ssl_ecdh_curve = 'prime256v1' # Name of the curve to use in ECDH key exchange ssl_dh_params_file = '' # Name of the file containing Diffie-Hellman parameters used # for so-called ephemeral DH family of SSL cipher. #ssl_passphrase_command='' # Sets an external command to be invoked when a passphrase # for decrypting an SSL file needs to be obtained # (change requires restart) #------------------------------------------------------------------------------ # POOLS #------------------------------------------------------------------------------ # - Concurrent session and pool size - num_init_children = 32 # Number of concurrent sessions allowed # (change requires restart) max_pool = 4 # Number of connection pool caches per connection # (change requires restart) # - Life time - child_life_time = 5min # Pool exits after being idle for this many seconds child_max_connections = 0 # Pool exits after receiving that many connections # 0 means no exit connection_life_time = 0 # Connection to backend closes after being idle for this many seconds # 0 means no close client_idle_limit = 0 # Client is disconnected after being idle for that many seconds # (even inside an explicit transactions!) # 0 means no disconnection #------------------------------------------------------------------------------ # LOGS #------------------------------------------------------------------------------ # - Where to log - log_destination = 'stderr' # Where to log # Valid values are combinations of stderr, # and syslog. Default to stderr. # - What to log - log_line_prefix = '%t: pid %p: ' # printf-style string to output at beginning of each log line. log_connections = off # Log connections log_disconnections = off # Log disconnections log_hostname = off # Hostname will be shown in ps status # and in logs if connections are logged log_statement = off # Log all statements log_per_node_statement = off # Log all statements # with node and backend informations log_client_messages = off # Log any client messages log_standby_delay = 'if_over_threshold' # Log standby delay # Valid values are combinations of always, # if_over_threshold, none # - Syslog specific - syslog_facility = 'LOCAL0' # Syslog local facility. Default to LOCAL0 syslog_ident = 'pgpool' # Syslog program identification string # Default to 'pgpool' # - Debug - #log_error_verbosity = default # terse, default, or verbose messages #client_min_messages = notice # values in order of decreasing detail: # debug5 # debug4 # debug3 # debug2 # debug1 # log # notice # warning # error #log_min_messages = warning # values in order of decreasing detail: # debug5 # debug4 # debug3 # debug2 # debug1 # info # notice # warning # error # log # fatal # panic # This is used when logging to stderr: logging_collector = on # Enable capturing of stderr # into log files. # (change requires restart) # -- Only used if logging_collector is on --- log_directory = '/var/log/pgpool' # directory where log files are written, # can be absolute log_filename = 'pgpool-%a.log' # log file name pattern, # can include strftime() escapes #log_file_mode = 0600 # creation mode for log files, # begin with 0 to use octal notation log_truncate_on_rotation = on # If on, an existing log file with the # same name as the new log file will be # truncated rather than appended to. # But such truncation only occurs on # time-driven rotation, not on restarts # or size-driven rotation. Default is # off, meaning append to existing files # in all cases. log_rotation_age = 1d # Automatic rotation of logfiles will # happen after that (minutes)time. # 0 disables time based rotation. log_rotation_size = 0 # Automatic rotation of logfiles will # happen after that much (KB) log output. # 0 disables size based rotation. #------------------------------------------------------------------------------ # FILE LOCATIONS #------------------------------------------------------------------------------ pid_file_name = '/var/run/pgpool/pgpool.pid' # PID file name # Can be specified as relative to the" # location of pgpool.conf file or # as an absolute path # (change requires restart) logdir = '/tmp' # Directory of pgPool status file # (change requires restart) #------------------------------------------------------------------------------ # CONNECTION POOLING #------------------------------------------------------------------------------ connection_cache = on # Activate connection pools # (change requires restart) # Semicolon separated list of queries # to be issued at the end of a session # The default is for 8.3 and later reset_query_list = 'ABORT; DISCARD ALL' # The following one is for 8.2 and before #reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT' #------------------------------------------------------------------------------ # REPLICATION MODE #------------------------------------------------------------------------------ replicate_select = off # Replicate SELECT statements # when in replication mode # replicate_select is higher priority than # load_balance_mode. insert_lock = off # Automatically locks a dummy row or a table # with INSERT statements to keep SERIAL data # consistency # Without SERIAL, no lock will be issued lobj_lock_table = '' # When rewriting lo_creat command in # replication mode, specify table name to # lock # - Degenerate handling - replication_stop_on_mismatch = off # On disagreement with the packet kind # sent from backend, degenerate the node # which is most likely "minority" # If off, just force to exit this session failover_if_affected_tuples_mismatch = off # On disagreement with the number of affected # tuples in UPDATE/DELETE queries, then # degenerate the node which is most likely # "minority". # If off, just abort the transaction to # keep the consistency #------------------------------------------------------------------------------ # LOAD BALANCING MODE #------------------------------------------------------------------------------ load_balance_mode = on # Activate load balancing mode # (change requires restart) ignore_leading_white_space = on # Ignore leading white spaces of each query read_only_function_list = '' # Comma separated list of function names # that don't write to database # Regexp are accepted write_function_list = '' # Comma separated list of function names # that write to database # Regexp are accepted # If both read_only_function_list and write_function_list # is empty, function's volatile property is checked. # If it's volatile, the function is regarded as a # writing function. primary_routing_query_pattern_list = '' # Semicolon separated list of query patterns # that should be sent to primary node # Regexp are accepted # valid for streaming replicaton mode only. database_redirect_preference_list = '' # comma separated list of pairs of database and node id. # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2' # valid for streaming replicaton mode only. app_name_redirect_preference_list = '' # comma separated list of pairs of app name and node id. # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby' # valid for streaming replicaton mode only. allow_sql_comments = off # if on, ignore SQL comments when judging if load balance or # query cache is possible. # If off, SQL comments effectively prevent the judgment # (pre 3.4 behavior). disable_load_balance_on_write = 'transaction' # Load balance behavior when write query is issued # in an explicit transaction. # # Valid values: # # 'transaction' (default): # if a write query is issued, subsequent # read queries will not be load balanced # until the transaction ends. # # 'trans_transaction': # if a write query is issued, subsequent # read queries in an explicit transaction # will not be load balanced until the session ends. # # 'dml_adaptive': # Queries on the tables that have already been # modified within the current explicit transaction will # not be load balanced until the end of the transaction. # # 'always': # if a write query is issued, read queries will # not be load balanced until the session ends. # # Note that any query not in an explicit transaction # is not affected by the parameter. dml_adaptive_object_relationship_list= '' # comma separated list of object pairs # [object]:[dependent-object], to disable load balancing # of dependent objects within the explicit transaction # after WRITE statement is issued on (depending-on) object. # # example: 'tb_t1:tb_t2,insert_tb_f_func():tb_f,tb_v:my_view' # Note: function name in this list must also be present in # the write_function_list # only valid for disable_load_balance_on_write = 'dml_adaptive'. statement_level_load_balance = off # Enables statement level load balancing #------------------------------------------------------------------------------ # NATIVE REPLICATION MODE #------------------------------------------------------------------------------ # - Streaming - sr_check_period = 10 # Streaming replication check period # Disabled (0) by default sr_check_user = 'pgpool' # Streaming replication check user # This is neccessary even if you disable streaming # replication delay check by sr_check_period = 0 sr_check_password = '' # Password for streaming replication check user # Leaving it empty will make Pgpool-II to first look for the # Password in pool_passwd file before using the empty password sr_check_database = 'postgres' # Database name for streaming replication check delay_threshold = 10000000 # Threshold before not dispatching query to standby node # Unit is in bytes # Disabled (0) by default # - Special commands - follow_primary_command = '' # Executes this command after main node failover # Special values: # %d = failed node id # %h = failed node host name # %p = failed node port number # %D = failed node database cluster path # %m = new main node id # %H = new main node hostname # %M = old main node id # %P = old primary node id # %r = new main port number # %R = new main database cluster path # %N = old primary node hostname # %S = old primary node port number # %% = '%' character #------------------------------------------------------------------------------ # HEALTH CHECK GLOBAL PARAMETERS #------------------------------------------------------------------------------ health_check_period = 5 # Health check period # Disabled (0) by default health_check_timeout = 30 # Health check timeout # 0 means no timeout health_check_user = 'pgpool' # Health check user health_check_password = '' # Password for health check user # Leaving it empty will make Pgpool-II to first look for the # Password in pool_passwd file before using the empty password health_check_database = '' # Database name for health check. If '', tries 'postgres' frist, health_check_max_retries = 3 # Maximum number of times to retry a failed health check before giving up. health_check_retry_delay = 1 # Amount of time to wait (in seconds) between retries. connect_timeout = 10000 # Timeout value in milliseconds before giving up to connect to backend. # Default is 10000 ms (10 second). Flaky network user may want to increase # the value. 0 means no timeout. # Note that this value is not only used for health check, # but also for ordinary conection to backend. #------------------------------------------------------------------------------ # HEALTH CHECK PER NODE PARAMETERS (OPTIONAL) #------------------------------------------------------------------------------ #health_check_period0 = 0 #health_check_timeout0 = 20 #health_check_user0 = 'nobody' #health_check_password0 = '' #health_check_database0 = '' #health_check_max_retries0 = 0 #health_check_retry_delay0 = 1 #connect_timeout0 = 10000 #------------------------------------------------------------------------------ # FAILOVER AND FAILBACK #------------------------------------------------------------------------------ failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S' # Executes this command at failover # Special values: # %d = failed node id # %h = failed node host name # %p = failed node port number # %D = failed node database cluster path # %m = new main node id # %H = new main node hostname # %M = old main node id # %P = old primary node id # %r = new main port number # %R = new main database cluster path # %N = old primary node hostname # %S = old primary node port number # %% = '%' character failback_command = '' # Executes this command at failback. # Special values: # %d = failed node id # %h = failed node host name # %p = failed node port number # %D = failed node database cluster path # %m = new main node id # %H = new main node hostname # %M = old main node id # %P = old primary node id # %r = new main port number # %R = new main database cluster path # %N = old primary node hostname # %S = old primary node port number # %% = '%' character failover_on_backend_error = off # Initiates failover when reading/writing to the # backend communication socket fails # If set to off, pgpool will report an # error and disconnect the session. detach_false_primary = off # Detach false primary if on. Only # valid in streaming replicaton # mode and with PostgreSQL 9.6 or # after. search_primary_node_timeout = 5min # Timeout in seconds to search for the # primary node when a failover occurs. # 0 means no timeout, keep searching # for a primary node forever. #------------------------------------------------------------------------------ # ONLINE RECOVERY #------------------------------------------------------------------------------ recovery_user = 'pgpool_backup' # Online recovery user recovery_password = '"pgpoolbck123dfa"' # Online recovery password # Leaving it empty will make Pgpool-II to first look for the # Password in pool_passwd file before using the empty password recovery_1st_stage_command = 'recovery_1st_stage' # Executes a command in first stage recovery_2nd_stage_command = '' # Executes a command in second stage recovery_timeout = 90 # Timeout in seconds to wait for the # recovering node's postmaster to start up # 0 means no wait client_idle_limit_in_recovery = 0 # Client is disconnected after being idle # for that many seconds in the second stage # of online recovery # 0 means no disconnection # -1 means immediate disconnection auto_failback = off # Dettached backend node reattach automatically # if replication_state is 'streaming'. auto_failback_interval = 1min # Min interval of executing auto_failback in # seconds. #------------------------------------------------------------------------------ # WATCHDOG #------------------------------------------------------------------------------ # - Enabling - use_watchdog = on # Activates watchdog # (change requires restart) # -Connection to up stream servers - trusted_servers = '' # trusted server list which are used # to confirm network connection # (hostA,hostB,hostC,...) # (change requires restart) ping_path = '/bin' # ping command path # (change requires restart) # - Watchdog communication Settings - hostname0 = '192.168.5.109' # Host name or IP address of pgpool node # for watchdog connection # (change requires restart) wd_port0 = 9000 # Port number for watchdog service # (change requires restart) pgpool_port0 = 5433 # Port number for pgpool # (change requires restart) hostname1 = '192.168.5.72' wd_port1 = 9000 pgpool_port1 = 5433 hostname2 = '192.168.5.243' wd_port2 = 9000 pgpool_port2 = 5433 wd_priority = 1 # priority of this watchdog in leader election # (change requires restart) wd_authkey = '' # Authentication key for watchdog communication # (change requires restart) wd_ipc_socket_dir = '/var/run/postgresql' # Unix domain socket path for watchdog IPC socket # The Debian package defaults to # /var/run/postgresql # (change requires restart) # - Virtual IP control Setting - delegate_IP = '192.168.5.200' # delegate IP address # If this is empty, virtual IP never bring up. # (change requires restart) if_cmd_path = '/sbin' # path to the directory where if_up/down_cmd exists # If if_up/down_cmd starts with "/", if_cmd_path will be ignored. # (change requires restart) if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev ens3 label ens3:0' # startup delegate IP command # (change requires restart) if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev ens3' # shutdown delegate IP command # (change requires restart) arping_path = '/usr/sbin' # arping command path # If arping_cmd starts with "/", if_cmd_path will be ignored. # (change requires restart) arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I ens3' # arping command # (change requires restart) # - Behaivor on escalation Setting - clear_memqcache_on_escalation = on # Clear all the query cache on shared memory # when standby pgpool escalate to active pgpool # (= virtual IP holder). # This should be off if client connects to pgpool # not using virtual IP. # (change requires restart) wd_escalation_command = '' # Executes this command at escalation on new active pgpool. # (change requires restart) wd_de_escalation_command = '' # Executes this command when leader pgpool resigns from being leader. # (change requires restart) # - Watchdog consensus settings for failover - failover_when_quorum_exists = on # Only perform backend node failover # when the watchdog cluster holds the quorum # (change requires restart) failover_require_consensus = on # Perform failover when majority of Pgpool-II nodes # aggrees on the backend node status change # (change requires restart) allow_multiple_failover_requests_from_node = off # A Pgpool-II node can cast multiple votes # for building the consensus on failover # (change requires restart) enable_consensus_with_half_votes = off # apply majority rule for consensus and quorum computation # at 50% of votes in a cluster with even number of nodes. # when enabled the existence of quorum and consensus # on failover is resolved after receiving half of the # total votes in the cluster, otherwise both these # decisions require at least one more vote than # half of the total votes. # (change requires restart) # - Lifecheck Setting - # -- common -- wd_monitoring_interfaces_list = '' # Comma separated list of interfaces names to monitor. # if any interface from the list is active the watchdog will # consider the network is fine # 'any' to enable monitoring on all interfaces except loopback # '' to disable monitoring # (change requires restart) wd_lifecheck_method = 'heartbeat' # Method of watchdog lifecheck ('heartbeat' or 'query' or 'external') # (change requires restart) wd_interval = 10 # lifecheck interval (sec) > 0 # (change requires restart) # -- heartbeat mode -- heartbeat_hostname0 = '192.168.5.109' # Host name or IP address used # for sending heartbeat signal. # (change requires restart) heartbeat_port0 = 9694 # Port number used for receiving/sending heartbeat signal # Usually this is the same as heartbeat_portX. # (change requires restart) heartbeat_device0 = '' # Name of NIC device (such like 'eth0') # used for sending/receiving heartbeat # signal to/from destination 0. # This works only when this is not empty # and pgpool has root privilege. # (change requires restart) heartbeat_hostname1 = '192.168.5.72' heartbeat_port1 = 9694 heartbeat_device1 = '' heartbeat_hostname2 = '192.168.5.243' heartbeat_port2 = 9694 heartbeat_device2 = '' wd_heartbeat_keepalive = 2 # Interval time of sending heartbeat signal (sec) # (change requires restart) wd_heartbeat_deadtime = 30 # Deadtime interval for heartbeat signal (sec) # (change requires restart) # -- query mode -- wd_life_point = 3 # lifecheck retry times # (change requires restart) wd_lifecheck_query = 'SELECT 1' # lifecheck query to pgpool from watchdog # (change requires restart) wd_lifecheck_dbname = 'template1' # Database name connected for lifecheck # (change requires restart) wd_lifecheck_user = 'nobody' # watchdog user monitoring pgpools in lifecheck # (change requires restart) wd_lifecheck_password = '' # Password for watchdog user in lifecheck # Leaving it empty will make Pgpool-II to first look for the # Password in pool_passwd file before using the empty password # (change requires restart) #------------------------------------------------------------------------------ # OTHERS #------------------------------------------------------------------------------ relcache_expire = 0 # Life time of relation cache in seconds. # 0 means no cache expiration(the default). # The relation cache is used for cache the # query result against PostgreSQL system # catalog to obtain various information # including table structures or if it's a # temporary table or not. The cache is # maintained in a pgpool child local memory # and being kept as long as it survives. # If someone modify the table by using # ALTER TABLE or some such, the relcache is # not consistent anymore. # For this purpose, cache_expiration # controls the life time of the cache. relcache_size = 256 # Number of relation cache # entry. If you see frequently: # "pool_search_relcache: cache replacement happend" # in the pgpool log, you might want to increate this number. check_temp_table = catalog # Temporary table check method. catalog, trace or none. # Default is catalog. check_unlogged_table = on # If on, enable unlogged table check in SELECT statements. # This initiates queries against system catalog of primary/main # thus increases load of primary. # If you are absolutely sure that your system never uses unlogged tables # and you want to save access to primary/main, you could turn this off. # Default is on. enable_shared_relcache = on # If on, relation cache stored in memory cache, # the cache is shared among child process. # Default is on. # (change requires restart) relcache_query_target = primary # Target node to send relcache queries. Default is primary node. # If load_balance_node is specified, queries will be sent to load balance node. #------------------------------------------------------------------------------ # IN MEMORY QUERY MEMORY CACHE #------------------------------------------------------------------------------ memory_cache_enabled = off # If on, use the memory cache functionality, off by default # (change requires restart) memqcache_method = 'shmem' # Cache storage method. either 'shmem'(shared memory) or # 'memcached'. 'shmem' by default # (change requires restart) memqcache_memcached_host = 'localhost' # Memcached host name or IP address. Mandatory if # memqcache_method = 'memcached'. # Defaults to localhost. # (change requires restart) memqcache_memcached_port = 11211 # Memcached port number. Mondatory if memqcache_method = 'memcached'. # Defaults to 11211. # (change requires restart) memqcache_total_size = 64MB # Total memory size in bytes for storing memory cache. # Mandatory if memqcache_method = 'shmem'. # Defaults to 64MB. # (change requires restart) memqcache_max_num_cache = 1000000 # Total number of cache entries. Mandatory # if memqcache_method = 'shmem'. # Each cache entry consumes 48 bytes on shared memory. # Defaults to 1,000,000(45.8MB). # (change requires restart) memqcache_expire = 0 # Memory cache entry life time specified in seconds. # 0 means infinite life time. 0 by default. # (change requires restart) memqcache_auto_cache_invalidation = on # If on, invalidation of query cache is triggered by corresponding # DDL/DML/DCL(and memqcache_expire). If off, it is only triggered # by memqcache_expire. on by default. # (change requires restart) memqcache_maxcache = 400kB # Maximum SELECT result size in bytes. # Must be smaller than memqcache_cache_block_size. Defaults to 400KB. # (change requires restart) memqcache_cache_block_size = 1MB # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'. # Defaults to 1MB. # (change requires restart) memqcache_oiddir = '/var/log/pgpool/oiddir' # Temporary work directory to record table oids # (change requires restart) cache_safe_memqcache_table_list = '' # Comma separated list of table names to memcache # that don't write to database # Regexp are accepted cache_unsafe_memqcache_table_list = '' # Comma separated list of table names not to memcache # that don't write to database # Regexp are accepted pgpool.conf (47,048 bytes)

pengbo 2022-01-26 13:18 developer ~0003990	I could reproduce this issue. It might be a bug. I will look into this issue and give you some feedback.

pengbo 2022-02-10 11:55 developer ~0003993	If the connection to backend times out, the streaming replication check process continues to retry. This retry causes a long time failover. I have fixed this issue: https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=41323954a6a33cc5e24a1faa6571235da55aabf2 This issue will be fixed in the next minor release.

Date Modified	Username	Field	Change
2022-01-18 21:21	Ken	New Issue
2022-01-18 21:21	Ken	File Added: log_failover.txt
2022-01-18 21:21	Ken	File Added: pgpool.conf
2022-01-19 14:18	pengbo	Assigned To	=> pengbo
2022-01-19 14:18	pengbo	Status	new => assigned
2022-01-19 14:18	pengbo	Description Updated
2022-01-26 13:18	pengbo	Note Added: 0003990
2022-02-10 11:55	pengbo	Note Added: 0003993
2022-02-10 11:55	pengbo	Status	assigned => feedback
2022-02-10 11:55	pengbo	Target Version	=> 4.2.8
2022-02-22 15:17	pengbo	Status	feedback => closed
2022-02-22 15:17	pengbo	Fixed in Version	=> 4.2.8