0000213: ERROR: node recovery failed, waiting connection closed in the other pgpools timeout - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000213	Pgpool-II	General	public	2016-07-01 20:15	2017-03-26 16:29

Reporter	shanmu	Assigned To	t-ishii
Priority	immediate	Severity	block	Reproducibility	always
Status	closed	Resolution	open
Platform	pgpool 3.5.3	OS	FreeBSD	OS Version	10.1

Summary	0000213: ERROR: node recovery failed, waiting connection closed in the other pgpools timeout
Description	Hi, I have 2 node pgpool and trying online recovery.. Stage 1 passes During Stage 2 I get following error (when I use cp_recovery_node) Even if i stop the applications then also I get this error. Can you please help me to close this ...our golive is blocked because of this.. many thanks in advance, pcp_recovery_node -U admin -p 9898 -n 1 -v -d Password: DEBUG: recv: tos="m", len=8 DEBUG: recv: tos="r", len=21 DEBUG: send: tos="D", len=6 DEBUG: recv: tos="E", len=114 ERROR: node recovery failed, waiting connection closed in the other pgpools timeout
Additional Information	===== log ===== Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6143: LOCATION: pcp_child.c:275 Jul 1 16:28:28 otp3 pgpool[6400]: [533-1] 2016-07-01 16:28:28: pid 6400: LOG: starting recovering node 1 Jul 1 16:28:28 otp3 pgpool[6400]: [533-2] 2016-07-01 16:28:28: pid 6400: LOCATION: recovery.c:68 Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: LOG: starting recovering node 1 Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: LOCATION: recovery.c:68 Jul 1 16:28:28 otp3 pgpool[6400]: [534-1] 2016-07-01 16:28:28: pid 6400: LOG: node recovery, CHECKPOINT in the 1st stage done Jul 1 16:28:28 otp3 pgpool[6400]: [534-2] 2016-07-01 16:28:28: pid 6400: LOCATION: recovery.c:97 Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: LOG: node recovery, CHECKPOINT in the 1st stage done Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: LOCATION: recovery.c:97 Jul 1 16:28:28 otp3 pgpool[6400]: [535-1] 2016-07-01 16:28:28: pid 6400: LOG: executing recovery Jul 1 16:28:28 otp3 pgpool[6400]: [535-2] 2016-07-01 16:28:28: pid 6400: DETAIL: starting recovery command: "SELECT pgpool_recovery('copy-base-backup', '10.200.1.54', '/data1', '5432')" Jul 1 16:28:28 otp3 pgpool[6400]: [535-3] 2016-07-01 16:28:28: pid 6400: LOCATION: recovery.c:255 Jul 1 16:28:28 otp3 pgpool[6400]: [536-1] 2016-07-01 16:28:28: pid 6400: LOG: executing recovery Jul 1 16:28:28 otp3 pgpool[6400]: [536-2] 2016-07-01 16:28:28: pid 6400: DETAIL: disabling statement_timeout Jul 1 16:28:28 otp3 pgpool[6400]: [536-3] 2016-07-01 16:28:28: pid 6400: LOCATION: recovery.c:259 Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: LOG: executing recovery Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: DETAIL: starting recovery command: "SELECT pgpool_recovery('copy-base-backup', '10.200.1.54', '/data1', '5432')" Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: LOCATION: recovery.c:255 Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: LOG: executing recovery Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: DETAIL: disabling statement_timeout Jul 1 16:28:28 otp3 pgpool: 2016-07-01 16:28:28: pid 6400: LOCATION: recovery.c:259 Jul 1 16:28:35 otp3 pgpool[6400]: [537-1] 2016-07-01 16:28:35: pid 6400: LOG: node recovery, 1st stage is done Jul 1 16:28:35 otp3 pgpool[6400]: [537-2] 2016-07-01 16:28:35: pid 6400: LOCATION: recovery.c:103 Jul 1 16:28:35 otp3 pgpool[6400]: [538-1] 2016-07-01 16:28:35: pid 6400: LOG: node recovery, starting 2nd stage Jul 1 16:28:35 otp3 pgpool[6400]: [538-2] 2016-07-01 16:28:35: pid 6400: LOCATION: recovery.c:108 Jul 1 16:28:35 otp3 pgpool: 2016-07-01 16:28:35: pid 6400: LOG: node recovery, 1st stage is done Jul 1 16:28:35 otp3 pgpool: 2016-07-01 16:28:35: pid 6400: LOCATION: recovery.c:103 Jul 1 16:28:35 otp3 pgpool: 2016-07-01 16:28:35: pid 6400: LOG: node recovery, starting 2nd stage Jul 1 16:28:35 otp3 pgpool: 2016-07-01 16:28:35: pid 6400: LOCATION: recovery.c:108 Jul 1 16:30:10 otp3 pgpool[6400]: [539-1] 2016-07-01 16:30:10: pid 6400: LOG: wait_connection_closed: existing connections did not close in 90 sec. Jul 1 16:30:10 otp3 pgpool[6400]: [539-2] 2016-07-01 16:30:10: pid 6400: LOCATION: recovery.c:446 Jul 1 16:30:10 otp3 pgpool[6400]: [540-1] 2016-07-01 16:30:10: pid 6400: ERROR: node recovery failed, waiting connection closed in the other pgpools timeout Jul 1 16:30:10 otp3 pgpool[6400]: [540-2] 2016-07-01 16:30:10: pid 6400: LOCATION: recovery.c:122 Jul 1 16:30:10 otp3 pgpool: 2016-07-01 16:30:10: pid 6400: LOG: wait_connection_closed: existing connections did not close in 90 sec. Jul 1 16:30:10 otp3 pgpool: 2016-07-01 16:30:10: pid 6400: LOCATION: recovery.c:446 Jul 1 16:30:10 otp3 pgpool: 2016-07-01 16:30:10: pid 6400: ERROR: node recovery failed, waiting connection closed in the other pgpools timeout Jul 1 16:30:10 otp3 pgpool: 2016-07-01 16:30:10: pid 6400: LOCATION: recovery.c:122 Jul 1 16:30:10 otp3 pgpool[6400]: [541-1] 2016-07-01 16:30:10: pid 6400: FATAL: authentication failed for new PCP connection Jul 1 16:30:10 otp3 pgpool[6400]: [541-2] 2016-07-01 16:30:10: pid 6400: DETAIL: connection not authorized Jul 1 16:30:10 otp3 pgpool[6400]: [541-3] 2016-07-01 16:30:10: pid 6400: LOCATION: pcp_worker.c:206 Jul 1 16:30:10 otp3 pgpool: 2016-07-01 16:30:10: pid 6400: FATAL: authentication failed for new PCP connection Jul 1 16:30:10 otp3 pgpool: 2016-07-01 16:30:10: pid 6400: DETAIL: connection not authorized Jul 1 16:30:10 otp3 pgpool: 2016-07-01 16:30:10: pid 6400: LOCATION: pcp_worker.c:206 Jul 1 16:30:10 otp3 pgpool[6143]: [534-1] 2016-07-01 16:30:10: pid 6143: LOG: PCP process with pid: 6400 exit with SUCCESS. Jul 1 16:30:10 otp3 pgpool[6143]: [534-2] 2016-07-01 16:30:10: pid 6143: LOCATION: pcp_child.c:327 Jul 1 16:30:10 otp3 pgpool[6143]: [535-1] 2016-07-01 16:30:10: pid 6143: LOG: PCP process with pid: 6400 exits with status 256 Jul 1 16:30:10 otp3 pgpool[6143]: [535-2] 2016-07-01 16:30:10: pid 6143: LOCATION: pcp_child.c:341 Jul 1 16:30:10 otp3 pgpool: 2016-07-01 16:30:10: pid 6143: LOG: PCP process with pid: 6400 exit with SUCCESS. Jul 1 16:30:10 otp3 pgpool: 2016-07-01 16:30:10: pid 6143: LOCATION: pcp_child.c:327 ============= === pgpool config ==== # ---------------------------- # pgPool-II configuration file # ---------------------------- # # This file consists of lines of the form: # # name = value # # Whitespace may be used. Comments are introduced with "#" anywhere on a line. # The complete list of parameter names and allowed values can be found in the # pgPool-II documentation. # # This file is read on server startup and when the server receives a SIGHUP :q root@otp3:~ # more /usr/local/etc/pgpool.conf # ---------------------------- # pgPool-II configuration file # ---------------------------- # # This file consists of lines of the form: # # name = value # # Whitespace may be used. Comments are introduced with "#" anywhere on a line. # The complete list of parameter names and allowed values can be found in the # pgPool-II documentation. # # This file is read on server startup and when the server receives a SIGHUP # signal. If you edit the file on a running system, you have to SIGHUP the # server for the changes to take effect, or use "pgpool reload". Some # parameters, which are marked below, require a server shutdown and restart to # take effect. # #------------------------------------------------------------------------------ # CONNECTIONS #------------------------------------------------------------------------------ # - pgpool Connection Settings - listen_addresses = '0.0.0.0' #listen_addresses = 'localhost' # Host name or IP address to listen on: # '' for all, '' for no TCP/IP connections # (change requires restart) port = 5432 # Port number # (change requires restart) socket_dir = '/tmp' # Unix domain socket path # The Debian package defaults to # /var/run/postgresql # (change requires restart) listen_backlog_multiplier = 2 # Set the backlog parameter of listen(2) to # num_init_children listen_backlog_multiplier. # (change requires restart) serialize_accept = on # whether to serialize accept() call to avoid thundering herd problem # (change requires restart) # - pgpool Communication Manager Connection Settings - pcp_listen_addresses = '' # Host name or IP address for pcp process to listen on: # '' for all, '' for no TCP/IP connections # (change requires restart) pcp_port = 9898 # Port number for pcp # (change requires restart) pcp_socket_dir = '/tmp' # Unix domain socket path for pcp # The Debian package defaults to # /var/run/postgresql # (change requires restart) # - Backend Connection Settings - #backend_hostname0 = 'localhost' # Host name or IP address to connect to for backend 0 #backend_port0 = 5432 # Port number for backend 0 #backend_weight0 = 1 # Weight for backend 0 (only in load balancing mode) #backend_data_directory0 = '/var/lib/pgsql/data' # Data directory for backend 0 #backend_flag0 = 'ALLOW_TO_FAILOVER' # Controls various backend behavior # ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER backend_hostname0 = '10.200.1.53' backend_port0 = 5432 backend_weight0 = 1 backend_data_directory0 = '/data1' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = '10.200.1.54' backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = '/data1' backend_flag1 = 'ALLOW_TO_FAILOVER' # - Authentication - enable_pool_hba = on # Use pool_hba.conf for client authentication pool_passwd = 'pool_passwd' # File name of pool_passwd for md5 authentication. # "" disables pool_passwd. # (change requires restart) authentication_timeout = 60 # Delay in seconds to complete client authentication # 0 means no timeout. # - SSL Connections - ssl = off # Enable SSL support # (change requires restart) #ssl_key = './server.key' # Path to the SSL private key file # (change requires restart) #ssl_cert = './server.cert' # Path to the SSL public certificate file # (change requires restart) #ssl_ca_cert = '' # Path to a single PEM format file # containing CA root certificate(s) # (change requires restart) #ssl_ca_cert_dir = '' # Directory containing CA root certificate(s) # (change requires restart) #------------------------------------------------------------------------------ # POOLS #------------------------------------------------------------------------------ # - Pool size - num_init_children = 100 # Number of pools # (change requires restart) max_pool = 4 # Number of connections per pool # (change requires restart) #max_connections = 128 # - Life time - child_life_time = 300 # Pool exits after being idle for this many seconds child_max_connections = 0 #child_max_connections = 128 # Pool exits after receiving that many connections # 0 means no exit connection_life_time = 300 # Connection to backend closes after being idle for this many seconds # 0 means no close client_idle_limit = 3600 # Client is disconnected after being idle for that many seconds # (even inside an explicit transactions!) # 0 means no disconnection #------------------------------------------------------------------------------ # LOGS #------------------------------------------------------------------------------ # - Where to log - log_destination = 'syslog' #log_destination = 'stderr' # Where to log # Valid values are combinations of stderr, # and syslog. Default to stderr. # - What to log - log_line_prefix = '%t: pid %p: ' # printf-style string to output at beginning of each log line. log_connections = off # Log connections log_hostname = off # Hostname will be shown in ps status # and in logs if connections are logged log_statement = off # Log all statements log_per_node_statement = off # Log all statements # with node and backend informations log_standby_delay = 'none' # Log standby delay # Valid values are combinations of always, # if_over_threshold, none # - Syslog specific - syslog_facility = 'LOCAL0' # Syslog local facility. Default to LOCAL0 syslog_ident = 'pgpool' # Syslog program identification string # Default to 'pgpool' # - Debug - debug_level = 0 # Debug message verbosity level # 0 means no message, 1 or more mean verbose log_error_verbosity = verbose # terse, default, or verbose messages client_min_messages = warning # values in order of decreasing detail: # debug5 # debug4 # debug3 # debug2 # debug1 # log # notice # warning # error log_min_messages = warning # values in order of decreasing detail: # debug5 # debug4 # debug3 # debug2 # debug1 # info # notice # warning # error # log # fatal # panic #------------------------------------------------------------------------------ # FILE LOCATIONS #------------------------------------------------------------------------------ pid_file_name = '/var/run/pgpool/pgpool.pid' # PID file name # (change requires restart) logdir = '/var/log/pgpool' # Directory of pgPool status file # (change requires restart) #------------------------------------------------------------------------------ # CONNECTION POOLING #------------------------------------------------------------------------------ connection_cache = on # Activate connection pools # (change requires restart) # Semicolon separated list of queries # to be issued at the end of a session # The default is for 8.3 and later reset_query_list = 'ABORT; DISCARD ALL' # The following one is for 8.2 and before #reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT' #------------------------------------------------------------------------------ # REPLICATION MODE #------------------------------------------------------------------------------ replication_mode = on # Activate replication mode # (change requires restart) replicate_select = off # Replicate SELECT statements # when in replication mode # replicate_select is higher priority than # load_balance_mode. insert_lock = on # Automatically locks a dummy row or a table # with INSERT statements to keep SERIAL data # consistency # Without SERIAL, no lock will be issued lobj_lock_table = '' # When rewriting lo_creat command in # replication mode, specify table name to # lock # - Degenerate handling - replication_stop_on_mismatch = off # On disagreement with the packet kind # sent from backend, degenerate the node # which is most likely "minority" # If off, just force to exit this session failover_if_affected_tuples_mismatch = off # On disagreement with the number of affected # tuples in UPDATE/DELETE queries, then # degenerate the node which is most likely # "minority". # If off, just abort the transaction to # keep the consistency #------------------------------------------------------------------------------ # LOAD BALANCING MODE #------------------------------------------------------------------------------ load_balance_mode = on # Activate load balancing mode # (change requires restart) ignore_leading_white_space = on # Ignore leading white spaces of each query white_function_list = '' # Comma separated list of function names # that don't write to database # Regexp are accepted black_function_list = 'nextval,setval,nextval,setval' # Comma separated list of function names # that write to database # Regexp are accepted database_redirect_preference_list = '' # comma separated list of pairs of database and node id. # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2' # valid for streaming replicaton mode only. app_name_redirect_preference_list = '' # comma separated list of pairs of app name and node id. # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby' # valid for streaming replicaton mode only. allow_sql_comments = off # if on, ignore SQL comments when judging if load balance or # query cache is possible. # If off, SQL comments effectively prevent the judgment # (pre 3.4 behavior). #------------------------------------------------------- # REGARDING DEAD LOCK #-------------------------------------------------------- # set this to true if you want to avoid deadlock situation when # replication enabled. # there will be noticable performance degration, however. # a work around is set this to false and insert /STRICT/ comment # at the beginning of the SQL command. replication_strict = on # when replication_strict is set to false, there will be a chance for # deadlocks. set this to non 0 (in milli seconds) to detect this # situation and resolve the deadlock aborting current session. replication_timeout = 5000 #------------------------------------------------------------------------------ # MASTER/SLAVE MODE #------------------------------------------------------------------------------ master_slave_mode = off # Activate master/slave mode # (change requires restart) master_slave_sub_mode = 'slony' # Master/slave sub mode # Valid values are combinations slony or # stream. Default is slony. # (change requires restart) # - Streaming - sr_check_period = 0 # Streaming replication check period # Disabled (0) by default sr_check_user = 'nobody' # Streaming replication check user # This is necessary even if you disable # streaming replication delay check with # sr_check_period = 0 sr_check_password = '' # Password for streaming replication check user sr_check_database = 'postgres' # Database name for streaming replication check delay_threshold = 0 # Threshold before not dispatching query to standby node # Unit is in bytes # Disabled (0) by default # - Special commands - follow_master_command = '' # Executes this command after master failover # Special values: # %d = node id # %h = host name # %p = port number # %D = database cluster path # %m = new master node id # %H = hostname of the new master node # %M = old master node id # %P = old primary node id # %r = new master port number # %R = new master database cluster path # %% = '%' character #------------------------------------------------------------------------------ # HEALTH CHECK #------------------------------------------------------------------------------ health_check_period = 10 # Health check period # Disabled (0) by default health_check_timeout = 20 # Health check timeout # 0 means no timeout health_check_user = 'admin' # Health check user health_check_password = 'yyyyyyy' # Password for health check user health_check_database = 'template1' # Database name for health check. If '', tries 'postgres' frist, then 'template1' health_check_max_retries = 2 # Maximum number of times to retry a failed health check before giving up. health_check_retry_delay = 5 # Amount of time to wait (in seconds) between retries. connect_timeout = 10000 # Timeout value in milliseconds before giving up to connect to backend. # Default is 10000 ms (10 second). Flaky network user may want to increase # the value. 0 means no timeout. # Note that this value is not only used for health check, # but also for ordinary conection to backend. #------------------------------------------------------------------------------ # FAILOVER AND FAILBACK #------------------------------------------------------------------------------ failover_command = '' # Executes this command at failover # Special values: # %d = node id # %h = host name # %p = port number # %D = database cluster path # %m = new master node id # %H = hostname of the new master node # %M = old master node id # %P = old primary node id # %r = new master port number # %R = new master database cluster path # %% = '%' character failback_command = '' # Executes this command at failback. # Special values: # %d = node id # %h = host name # %p = port number # %D = database cluster path # %m = new master node id # %H = hostname of the new master node # %M = old master node id # %P = old primary node id # %r = new master port number # %R = new master database cluster path # %% = '%' character fail_over_on_backend_error = on # Initiates failover when reading/writing to the # backend communication socket fails # If set to off, pgpool will report an # error and disconnect the session. search_primary_node_timeout = 10 # Timeout in seconds to search for the # primary node when a failover occurs. # 0 means no timeout, keep searching # for a primary node forever. #------------------------------------------------------------------------------ # ONLINE RECOVERY #------------------------------------------------------------------------------ recovery_user = 'pgsql' #recovery_user = 'nobody' # Online recovery user recovery_password = 'xxxx' # Online recovery password recovery_1st_stage_command = 'copy-base-backup' # Executes a command in first stage recovery_2nd_stage_command = 'pgpool_recovery_pitr' # Executes a command in second stage recovery_timeout = 90 # Timeout in seconds to wait for the # recovering node's postmaster to start up # 0 means no wait client_idle_limit_in_recovery = -1 #client_idle_limit_in_recovery = 0 # Client is disconnected after being idle # for that many seconds in the second stage # of online recovery # 0 means no disconnection # -1 means immediate disconnection #------------------------------------------------------------------------------ # WATCHDOG #------------------------------------------------------------------------------ # - Enabling - use_watchdog = off # Activates watchdog # (change requires restart) # -Connection to up stream servers - trusted_servers = '' # trusted server list which are used # to confirm network connection # (hostA,hostB,hostC,...) # (change requires restart) ping_path = '/bin' # ping command path # (change requires restart) # - Watchdog communication Settings - wd_hostname = '' # Host name or IP address of this watchdog # (change requires restart) wd_port = 9000 # port number for watchdog service # (change requires restart) wd_priority = 1 # priority of this watchdog in leader election # (change requires restart) wd_authkey = '' # Authentication key for watchdog communication # (change requires restart) wd_ipc_socket_dir = '/tmp' # Unix domain socket path for watchdog IPC socket # The Debian package defaults to # /var/run/postgresql # (change requires restart) # - Virtual IP control Setting - delegate_IP = '' # delegate IP address # If this is empty, virtual IP never bring up. # (change requires restart) if_cmd_path = '/sbin' # path to the directory where if_up/down_cmd exists # (change requires restart) if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0' # startup delegate IP command # (change requires restart) if_down_cmd = 'ip addr del $_IP_$/24 dev eth0' # shutdown delegate IP command # (change requires restart) arping_path = '/usr/sbin' # arping command path # (change requires restart) arping_cmd = 'arping -U $_IP_$ -w 1' # arping command # (change requires restart) # - Behaivor on escalation Setting - clear_memqcache_on_escalation = on # Clear all the query cache on shared memory # when standby pgpool escalate to active pgpool # (= virtual IP holder). # This should be off if client connects to pgpool # not using virtual IP. # (change requires restart) wd_escalation_command = '' # Executes this command at escalation on new active pgpool. # (change requires restart) wd_de_escalation_command = '' # Executes this command when master pgpool resigns from being master. # (change requires restart) # - Lifecheck Setting - # -- common -- wd_monitoring_interfaces_list = '' # Comma separated list of interfaces names to monitor. # if any interface from the list is active the watchdog will # consider the network is fine # 'any' to enable monitoring on all interfaces except loopback # '' to disable monitoring wd_lifecheck_method = 'heartbeat' # Method of watchdog lifecheck ('heartbeat' or 'query' or 'external') # (change requires restart) wd_interval = 10 # lifecheck interval (sec) > 0 # (change requires restart) # -- heartbeat mode -- wd_heartbeat_port = 9694 # Port number for receiving heartbeat signal # (change requires restart) wd_heartbeat_keepalive = 2 # Interval time of sending heartbeat signal (sec) # (change requires restart) wd_heartbeat_deadtime = 30 # Deadtime interval for heartbeat signal (sec) # (change requires restart) heartbeat_destination0 = 'host0_ip1' # Host name or IP address of destination 0 # for sending heartbeat signal. # (change requires restart) heartbeat_destination_port0 = 9694 # Port number of destination 0 for sending # heartbeat signal. Usually this is the # same as wd_heartbeat_port. # (change requires restart) heartbeat_device0 = '' # Name of NIC device (such like 'eth0') # used for sending/receiving heartbeat # signal to/from destination 0. # This works only when this is not empty # and pgpool has root privilege. # (change requires restart) #heartbeat_destination1 = 'host0_ip2' #heartbeat_destination_port1 = 9694 #heartbeat_device1 = '' # -- query mode -- wd_life_point = 3 # lifecheck retry times # (change requires restart) wd_lifecheck_query = 'SELECT 1' # lifecheck query to pgpool from watchdog # (change requires restart) wd_lifecheck_dbname = 'template1' # Database name connected for lifecheck # (change requires restart) wd_lifecheck_user = 'nobody' # watchdog user monitoring pgpools in lifecheck # (change requires restart) wd_lifecheck_password = '' # Password for watchdog user in lifecheck # (change requires restart) # - Other pgpool Connection Settings - #other_pgpool_hostname0 = 'host0' # Host name or IP address to connect to for other pgpool 0 # (change requires restart) #other_pgpool_port0 = 5432 # Port number for othet pgpool 0 # (change requires restart) #other_wd_port0 = 9000 # Port number for othet watchdog 0 # (change requires restart) #other_pgpool_hostname1 = 'host1' #other_pgpool_port1 = 5432 #other_wd_port1 = 9000 #------------------------------------------------------------------------------ # OTHERS #------------------------------------------------------------------------------ relcache_expire = 0 # Life time of relation cache in seconds. # 0 means no cache expiration(the default). # The relation cache is used for cache the # query result against PostgreSQL system # catalog to obtain various information # including table structures or if it's a # temporary table or not. The cache is # maintained in a pgpool child local memory # and being kept as long as it survives. # If someone modify the table by using # ALTER TABLE or some such, the relcache is # not consistent anymore. # For this purpose, cache_expiration # controls the life time of the cache. relcache_size = 256 # Number of relation cache # entry. If you see frequently: # "pool_search_relcache: cache replacement happend" # in the pgpool log, you might want to increate this number. check_temp_table = on # If on, enable temporary table check in SELECT statements. # This initiates queries against system catalog of primary/master # thus increases load of master. # If you are absolutely sure that your system never uses temporary tables # and you want to save access to primary/master, you could turn this off. # Default is on. check_unlogged_table = on # If on, enable unlogged table check in SELECT statements. # This initiates queries against system catalog of primary/master # thus increases load of master. # If you are absolutely sure that your system never uses unlogged tables # and you want to save access to primary/master, you could turn this off. # Default is on. #------------------------------------------------------------------------------ # IN MEMORY QUERY MEMORY CACHE #------------------------------------------------------------------------------ memory_cache_enabled = off # If on, use the memory cache functionality, off by default memqcache_method = 'shmem' # Cache storage method. either 'shmem'(shared memory) or # 'memcached'. 'shmem' by default # (change requires restart) memqcache_memcached_host = 'localhost' # Memcached host name or IP address. Mandatory if # memqcache_method = 'memcached'. # Defaults to localhost. # (change requires restart) memqcache_memcached_port = 11211 # Memcached port number. Mondatory if memqcache_method = 'memcached'. # Defaults to 11211. # (change requires restart) memqcache_total_size = 67108864 # Total memory size in bytes for storing memory cache. # Mandatory if memqcache_method = 'shmem'. # Defaults to 64MB. # (change requires restart) memqcache_max_num_cache = 1000000 # Total number of cache entries. Mandatory # if memqcache_method = 'shmem'. # Each cache entry consumes 48 bytes on shared memory. # Defaults to 1,000,000(45.8MB). # (change requires restart) memqcache_expire = 0 # Memory cache entry life time specified in seconds. # 0 means infinite life time. 0 by default. # (change requires restart) memqcache_auto_cache_invalidation = on # If on, invalidation of query cache is triggered by corresponding # DDL/DML/DCL(and memqcache_expire). If off, it is only triggered # by memqcache_expire. on by default. # (change requires restart) memqcache_maxcache = 409600 # Maximum SELECT result size in bytes. # Must be smaller than memqcache_cache_block_size. Defaults to 400KB. # (change requires restart) memqcache_cache_block_size = 1048576 # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'. # Defaults to 1MB. # (change requires restart) memqcache_oiddir = '/var/log/pgpool/oiddir' # Temporary work directory to record table oids # (change requires restart) white_memqcache_table_list = '' # Comma separated list of table names to memcache # that don't write to database # Regexp are accepted black_memqcache_table_list = '' # Comma separated list of table names not to memcache # that don't write to database # Regexp are accepted =====
Tags	No tags attached.

t-ishii 2016-07-12 13:30 developer ~0000885	Please provide the contents of 'copy-base-backup' and 'pgpool_recovery_pitr', plus PostgreSQL log of '10.200.1.54'.

shanmu 2016-07-12 13:49 reporter	postgresql.log.zip (243,788 bytes)

shanmu 2016-07-12 13:56 reporter ~0000886	Hi please find the contents of 'copy-base-backup' and 'pgpool_recovery_pitr' - also postgres log of 10.200.1.54 is attached. Please note the problem occurred on 1st July 2016. (the log file contains other dates data also) ==== copy-base-backup ==== #! /bin/sh MASTER_HOST=$(hostname -f) DATA=$1 RECOVERY_TARGET=$2 RECOVERY_DATA=$3 USER=pgsql su -m ${USER} psql -c "select pg_start_backup ('pgpool-recovery')" postgres echo "restore_command = 'scp $MASTER_HOST:/data1/pg_xlog/%f $DATA/%p'" > /data1/recovery.conf tar -C /$DATA -zc --absolute-paths --preserve-permissions --exclude='/data1/postmaster.pid' -f /data2/pgsql.tar.gz $DATA psql -c 'select pg_stop_backup()' postgres scp /data2/pgsql.tar.gz $RECOVERY_TARGET:$RECOVERY_DATA ==== ==== ==== ==== ==== ==== ==== ==== pgpool_recovery_pitr ==== #! /bin/sh # Online recovery 2nd stage script # USER=pgsql datadir=$1 # master dabatase cluster DEST=$2 # hostname of the DB node to be recovered DESTDIR=$3 # database cluster of the DB node to be recovered port=$4 # PostgreSQL port number archdir=/data3/archive_log # archive log directory #archdir=/data/archive_log # archive log directory # Force to flush current value of sequences to xlog su -m ${USER} rm $archdir/* # Force to flush current value of sequences to xlog psql -p $port -t -c 'SELECT datname FROM pg_database WHERE NOT datistemplate AND datallowconn' template1\| while read i do if [ "$i" != "" ];then psql -p $port -c "SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = 'S'" $i fi done psql -p $port -c "SELECT pgpool_switch_xlog('$archdir')" template1 ==== ==== ==== ==== ==== ==== ====

t-ishii 2016-07-12 14:01 developer ~0000887	Your PostgreSQL log suggests that PostgreSQL is working in streaming replication mode. But pgpool.conf suggests that pgpool-II is working in replication mode, not streaming replication mode. You need to fix it first.

shanmu 2016-07-12 14:12 reporter ~0000889	I suppose I have enabled only local archive - NOT STEAMING REPLICATION. The local archive files are used for Stage-2 . - AM I Wrong still ?? Please find the postgresql.conf portion ======= #------------------------------------------------------------------------------ # WRITE AHEAD LOG #------------------------------------------------------------------------------ # - Settings - wal_level = archive # minimal, archive, or hot_standby #wal_level = minimal # minimal, archive, or hot_standby # (change requires restart) #fsync = on # turns forced synchronization on or off #synchronous_commit = on # synchronization level; # off, local, remote_write, or on #wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync # fdatasync (default on Linux) # fsync # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes #wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # - Checkpoints - #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 5min # range 30s-1h #checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0 #checkpoint_warning = 30s # 0 disables # - Archiving - archive_mode = on # allows archiving to be done #archive_mode = off # allows archiving to be done # (change requires restart) archive_command = 'cp -i /data1/%p /data3/archive_log/%f < /dev/null' #archive_command = '' # command to use to archive a logfile segment # placeholders: %p = path of file to archive # %f = file name only # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' #archive_timeout = 0 # force a logfile segment switch after this # number of seconds; 0 disables =======

t-ishii 2017-02-23 16:34 developer ~0001357	Sorry for delay. Not reproduced here.

t-ishii 2017-03-26 16:28 developer ~0001400	No response from reporter. Close this issue.

Date Modified	Username	Field	Change
2016-07-01 20:15	shanmu	New Issue
2016-07-12 13:30	t-ishii	Note Added: 0000885
2016-07-12 13:30	t-ishii	Assigned To	=> t-ishii
2016-07-12 13:30	t-ishii	Status	new => feedback
2016-07-12 13:30	t-ishii	Description Updated
2016-07-12 13:30	t-ishii	Additional Information Updated
2016-07-12 13:49	shanmu	File Added: postgresql.log.zip
2016-07-12 13:56	shanmu	Note Added: 0000886
2016-07-12 13:56	shanmu	Status	feedback => assigned
2016-07-12 14:01	t-ishii	Note Added: 0000887
2016-07-12 14:02	t-ishii	Status	assigned => feedback
2016-07-12 14:12	shanmu	Note Added: 0000889
2016-07-12 14:12	shanmu	Status	feedback => assigned
2017-02-23 16:34	t-ishii	Note Added: 0001357
2017-02-23 16:34	t-ishii	Status	assigned => feedback
2017-03-26 16:28	t-ishii	Note Added: 0001400
2017-03-26 16:29	t-ishii	Status	feedback => closed