View Issue Details

IDProjectCategoryView StatusLast Update
0000475Pgpool-II[All Projects] Generalpublic2019-03-21 19:08
ReporterankurAssigned Tot-ishii 
PriorityhighSeveritymajorReproducibilityalways
Status resolvedResolutionopen 
PlatformlinuxOSrhelOS Version7.6
Product Version4.0.3 
Target VersionFixed in Version 
Summary0000475: pcp_recovery_node command hangs while recovering standby and standby not being shown up in show pool_nodes;
DescriptionIts a sub-part of a cluster building . I am executing pcp_recovery_node on master to build standby from scratch with the command

pcp_recovery_node -h 193.185.83.119 -p 9898 -U postgres -n 1

Here, 193.185.83.119 is vip.

It successfully builds and starts the standby on node-b ( say nodes are node-a and node-b) but at the same time above command does not return and simply hangs in shell like :-

[postgres@rollc-filesrvr1 data]$ pcp_recovery_node -h 193.185.83.119 -p 9898 -U postgres -n 1
Password:

I have to use ctrl+c to come out of this session.
Later on when I try to create a test db on node-a (master) i get following error :

postgres=# create database test;
ERROR: source database "template1" is being accessed by other users
DETAIL: There is 1 other session using the database.

I confirm that pgpool.service is running at the time of running this command on node-a and i have tried using on/off pgpool.service on node-b (standby) before issuing pcp command. Result remains the same.

Also I tried googling and tweaked following settings in pgpool.conf . I am not sure if it could be something with these parameters:

wd_lifecheck_dbname in pgpool.conf

Initially related settings to above were ( and i was getting still same result):

wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''

later on , i found different settings at https://www.pgpool.net/mantisbt/view.php?id=242 and https://www.pgpool.net/mantisbt/view.php?id=394 one suggestion at https://www.mail-archive.com/pgpool-general@pgfoundry.org/msg01639.html and tried different combinations like following :

    wd_lifecheck_dbname = 'template1'
    wd_lifecheck_user = 'postgres'
    wd_lifecheck_password = ''

or

    wd_lifecheck_dbname = 'postgres'
    wd_lifecheck_user = 'postgres'
    wd_lifecheck_password = ''

But none helped in changing the situation neither on shell nor allowed me to create test db on master. I feel , i reached a dead end.

I am still not able to fully understand the purpose and meaning of above 3 parameters in pgpool and somehow suspect that these are the ones I am not configuration correct although there could be others reasons also.

just to help , here is the environment details again.

 - node-a and nod-b environment : rhel 7.6
 - postgres version : 10.7
 - pgpool-|| version: 4.0.3
 - replication slot + wal archive

Here are the logs from node-a pgpool.service

    Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 16642: LOG: forked new pcp worker, pid=8534 socket=7
    Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: LOG: starting recovering node 1
    Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: LOG: executing recovery
    Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: DETAIL: starting recovery command: "SELECT pgpool_recovery('recovery_1st_stage', 'node-a-ip', '/data/test/data', '5438', 1)"
    Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: LOG: executing recovery
    Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: DETAIL: disabling statement_timeout
    Mar 18 21:10:18 node-a pgpool[16583]: 2019-03-18 21:10:18: pid 8534: LOG: node recovery, 1st stage is done
    Mar 18 21:11:37 node-a pgpool[16583]: 2019-03-18 21:11:37: pid 8534: LOG: checking if postmaster is started
    Mar 18 21:11:37 node-a pgpool[16583]: 2019-03-18 21:11:37: pid 8534: DETAIL: trying to connect to postmaster on hostname:node-b-ip database:postgres user:postgres (retry 0 times)
    ...
    ...2 more times

    Mar 18 21:11:49 node-a pgpool[16583]: 2019-03-18 21:11:49: pid 8534: LOG: checking if postmaster is started
    Mar 18 21:11:49 node-a pgpool[16583]: 2019-03-18 21:11:49: pid 8534: DETAIL: trying to connect to postmaster on hostname:node-a-ip database:template1 user:postgres (retry 0 times)
    ....
    ...it keeps on trying till i press ctrl+c on pcp command windows . I have seen it going upto 30 or more.

postgres=> show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change
---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0 | node-a-ip | 5438 | up | 0.500000 | primary | 0 | true | 0 | 2019-03-18 22:59:19
 1 | node-b-ip | 5438 | down | 0.500000 | standby | 0 | false | 0 | 2019-03-18 22:59:19
(2 rows)
Steps To ReproduceTry issuing pcp_recovery_node on master before building the standby first time.
Additional Information# ----------------------------
# pgPool-II configuration file
# ----------------------------
#
# This file consists of lines of the form:
#
# name = value
#
# Whitespace may be used. Comments are introduced with "#" anywhere on a line.
# The complete list of parameter names and allowed values can be found in the
# pgPool-II documentation.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal. If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pgpool reload". Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#


#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------

# - pgpool Connection Settings -

listen_addresses = '*'
                                   # Host name or IP address to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
port = 5432
                                   # Port number
                                   # (change requires restart)
socket_dir = '/tmp'
                                   # Unix domain socket path
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)
listen_backlog_multiplier = 2
                                   # Set the backlog parameter of listen(2) to
                                                                   # num_init_children * listen_backlog_multiplier.
                                   # (change requires restart)
serialize_accept = off
                                   # whether to serialize accept() call to avoid thundering herd problem
                                   # (change requires restart)

# - pgpool Communication Manager Connection Settings -

pcp_listen_addresses = '*'
                                   # Host name or IP address for pcp process to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
pcp_port = 9898
                                   # Port number for pcp
                                   # (change requires restart)
pcp_socket_dir = '/tmp'
                                   # Unix domain socket path for pcp
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)

# - Backend Connection Settings -

backend_hostname0 = '193.185.83.114'
                                   # Host name or IP address to connect to for backend 0
backend_port0 = 5438
                                   # Port number for backend 0
backend_weight0 = 1
                                   # Weight for backend 0 (only in load balancing mode)

backend_data_directory0 = '/data/test/data'
                                   # Data directory for backend 0
backend_flag0 = 'ALLOW_TO_FAILOVER'
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER, DISALLOW_TO_FAILOVER
                                   # or ALWAYS_MASTER
backend_hostname1 = '193.185.83.115'
backend_port1 = 5438
backend_weight1 = 1
backend_data_directory1 = '/data/test/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -

enable_pool_hba = on
                                   # Use pool_hba.conf for client authentication
pool_passwd = 'pool_passwd'
                                   # File name of pool_passwd for md5 authentication.
                                   # "" disables pool_passwd.
                                   # (change requires restart)
authentication_timeout = 60
                                   # Delay in seconds to complete client authentication
                                   # 0 means no timeout.

allow_clear_text_frontend_auth = off
                                                                   # Allow Pgpool-II to use clear text password authentication
                                                                   # with clients, when pool_passwd does not
                                                                   # contain the user password


# - SSL Connections -

ssl = off
                                   # Enable SSL support
                                   # (change requires restart)
#ssl_key = './server.key'
                                   # Path to the SSL private key file
                                   # (change requires restart)
#ssl_cert = './server.cert'
                                   # Path to the SSL public certificate file
                                   # (change requires restart)
#ssl_ca_cert = ''
                                   # Path to a single PEM format file
                                   # containing CA root certificate(s)
                                   # (change requires restart)
#ssl_ca_cert_dir = ''
                                   # Directory containing CA root certificate(s)
                                   # (change requires restart)


#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------

# - Concurrent session and pool size -

num_init_children = 32
                                   # Number of concurrent sessions allowed
                                   # (change requires restart)
max_pool = 4
                                   # Number of connection pool caches per connection
                                   # (change requires restart)

# - Life time -

child_life_time = 300
                                   # Pool exits after being idle for this many seconds
child_max_connections = 0
                                   # Pool exits after receiving that many connections
                                   # 0 means no exit
connection_life_time = 0
                                   # Connection to backend closes after being idle for this many seconds
                                   # 0 means no close
client_idle_limit = 0
                                   # Client is disconnected after being idle for that many seconds
                                   # (even inside an explicit transactions!)
                                   # 0 means no disconnection


#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------

# - Where to log -

log_destination = 'stderr'
                                   # Where to log
                                   # Valid values are combinations of stderr,
                                   # and syslog. Default to stderr.

# - What to log -

log_line_prefix = '%t: pid %p: ' # printf-style string to output at beginning of each log line.

log_connections = on
                                   # Log connections
log_hostname = on
                                   # Hostname will be shown in ps status
                                   # and in logs if connections are logged
log_statement = on
                                   # Log all statements
log_per_node_statement = on
                                   # Log all statements
                                   # with node and backend informations
log_client_messages = on
                                   # Log any client messages
log_standby_delay = 'none'
                                   # Log standby delay
                                   # Valid values are combinations of always,
                                   # if_over_threshold, none

# - Syslog specific -

syslog_facility = 'LOCAL0'
                                   # Syslog local facility. Default to LOCAL0
syslog_ident = 'pgpool'
                                   # Syslog program identification string
                                   # Default to 'pgpool'

# - Debug -

log_error_verbosity = default # terse, default, or verbose messages

#client_min_messages = debug2 # values in order of decreasing detail:
                                        # debug5
                                        # debug4
                                        # debug3
                                        # debug2
                                        # debug1
                                        # log
                                        # notice
                                        # warning
                                        # error

#log_min_messages = debug2 # values in order of decreasing detail:
                                        # debug5
                                        # debug4
                                        # debug3
                                        # debug2
                                        # debug1
                                        # info
                                        # notice
                                        # warning
                                        # error
                                        # log
                                        # fatal
                                        # panic

#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
#Default value
#pid_file_name = '/var/run/pgpool/pgpool.pid'

pid_file_name = '/etc/pgpool-II/pgpool.pid'

                                   # PID file name
                                   # Can be specified as relative to the"
                                   # location of pgpool.conf file or
                                   # as an absolute path
                                   # (change requires restart)
logdir = '/data/postgres/'
                                   # Directory of pgPool status file
                                   # (change requires restart)


#------------------------------------------------------------------------------
# CONNECTION POOLING
#------------------------------------------------------------------------------

connection_cache = on
                                   # Activate connection pools
                                   # (change requires restart)

                                   # Semicolon separated list of queries
                                   # to be issued at the end of a session
                                   # The default is for 8.3 and later
reset_query_list = 'ABORT; DISCARD ALL'
                                   # The following one is for 8.2 and before
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'


#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------

replication_mode = off
                                   # Activate replication mode
                                   # (change requires restart)
replicate_select = off
                                   # Replicate SELECT statements
                                   # when in replication mode
                                   # replicate_select is higher priority than
                                   # load_balance_mode.

insert_lock = on
                                   # Automatically locks a dummy row or a table
                                   # with INSERT statements to keep SERIAL data
                                   # consistency
                                   # Without SERIAL, no lock will be issued
lobj_lock_table = ''
                                   # When rewriting lo_creat command in
                                   # replication mode, specify table name to
                                   # lock

# - Degenerate handling -

replication_stop_on_mismatch = off
                                   # On disagreement with the packet kind
                                   # sent from backend, degenerate the node
                                   # which is most likely "minority"
                                   # If off, just force to exit this session

failover_if_affected_tuples_mismatch = off
                                   # On disagreement with the number of affected
                                   # tuples in UPDATE/DELETE queries, then
                                   # degenerate the node which is most likely
                                   # "minority".
                                   # If off, just abort the transaction to
                                   # keep the consistency


#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------

load_balance_mode = on
                                   # Activate load balancing mode
                                   # (change requires restart)
ignore_leading_white_space = on
                                   # Ignore leading white spaces of each query
white_function_list = ''
                                   # Comma separated list of function names
                                   # that don't write to database
                                   # Regexp are accepted
black_function_list = 'nextval,setval,nextval,setval'
                                   # Comma separated list of function names
                                   # that write to database
                                   # Regexp are accepted

black_query_pattern_list = ''
                                   # Semicolon separated list of query patterns
                                   # that should be sent to primary node
                                   # Regexp are accepted
                                                                   # valid for streaming replicaton mode only.

database_redirect_preference_list = ''
                                                                   # comma separated list of pairs of database and node id.
                                                                   # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2'
                                                                   # valid for streaming replicaton mode only.
app_name_redirect_preference_list = ''
                                                                   # comma separated list of pairs of app name and node id.
                                                                   # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby'
                                                                   # valid for streaming replicaton mode only.
allow_sql_comments = off
                                                                   # if on, ignore SQL comments when judging if load balance or
                                                                   # query cache is possible.
                                                                   # If off, SQL comments effectively prevent the judgment
                                                                   # (pre 3.4 behavior).

disable_load_balance_on_write = 'transaction' # Load balance behavior when write query is issued
                                                # in an explicit transaction.
                                                # Note that any query not in an explicit transaction
                                                # is not affected by the parameter.
                                                # 'transaction' (the default): if a write query is issued,
                                                # subsequent read queries will not be load balanced
                                                # until the transaction ends.
                                                # 'trans_transaction': if a write query is issued,
                                                # subsequent read queries in an explicit transaction
                                                # will not be load balanced until the session ends.
                                                # 'always': if a write query is issued, read queries will
                                                # not be load balanced until the session ends.

#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------

master_slave_mode = on
                                   # Activate master/slave mode
                                   # (change requires restart)
master_slave_sub_mode = 'stream'
                                   # Master/slave sub mode
                                   # Valid values are combinations stream, slony
                                   # or logical. Default is stream.
                                   # (change requires restart)

# - Streaming -

sr_check_period = 5
                                   # Streaming replication check period
                                   # Disabled (0) by default
sr_check_user = 'postgres'
                                   # Streaming replication check user
                                   # This is necessary even if you disable
                                   # streaming replication delay check with
                                   # sr_check_period = 0

sr_check_password = 'File&2018'
                                                                   # Password for streaming replication check user.
                                                                   # Leaving it empty will make Pgpool-II to first look for the
                                                                   # Password in pool_passwd file before using the empty password

sr_check_database = 'postgres'
                                   # Database name for streaming replication check
delay_threshold = 0
                                   # Threshold before not dispatching query to standby node
                                   # Unit is in bytes
                                   # Disabled (0) by default

# - Special commands -

follow_master_command = ''
                                   # Executes this command after master failover
                                   # Special values:
                                   # %d = node id
                                   # %h = host name
                                   # %p = port number
                                   # %D = database cluster path
                                   # %m = new master node id
                                   # %H = hostname of the new master node
                                   # %M = old master node id
                                   # %P = old primary node id
                                   # %r = new master port number
                                   # %R = new master database cluster path
                                   # %% = '%' character

#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS
#------------------------------------------------------------------------------

health_check_period = 5
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'postgres'
                                   # Health check user
health_check_password = 'File&2018'
                                   # Password for health check user
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password

health_check_database = ''
                                   # Database name for health check. If '', tries 'postgres' frist, then 'template1'

health_check_max_retries = 10
                                   # Maximum number of times to retry a failed health check before giving up.
health_check_retry_delay = 1
                                   # Amount of time to wait (in seconds) between retries.
connect_timeout = 10000
                                   # Timeout value in milliseconds before giving up to connect to backend.
                                                                   # Default is 10000 ms (10 second). Flaky network user may want to increase
                                                                   # the value. 0 means no timeout.
                                                                   # Note that this value is not only used for health check,
                                                                   # but also for ordinary conection to backend.

#------------------------------------------------------------------------------
# HEALTH CHECK PER NODE PARAMETERS (OPTIONAL)
#------------------------------------------------------------------------------
#health_check_period0 = 0
#health_check_timeout0 = 20
#health_check_user0 = 'nobody'
#health_check_password0 = ''
#health_check_database0 = ''
#health_check_max_retries0 = 0
#health_check_retry_delay0 = 1
#connect_timeout0 = 10000

#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/data/test/data/failover.sh %d %P %H reppassword /data/test/data/im_the_master'
                                   # Executes this command at failover
                                   # Special values:
                                   # %d = node id
                                   # %h = host name
                                   # %p = port number
                                   # %D = database cluster path
                                   # %m = new master node id
                                   # %H = hostname of the new master node
                                   # %M = old master node id
                                   # %P = old primary node id
                                   # %r = new master port number
                                   # %R = new master database cluster path
                                   # %% = '%' character
failback_command = ''
                                   # Executes this command at failback.
                                   # Special values:
                                   # %d = node id
                                   # %h = host name
                                   # %p = port number
                                   # %D = database cluster path
                                   # %m = new master node id
                                   # %H = hostname of the new master node
                                   # %M = old master node id
                                   # %P = old primary node id
                                   # %r = new master port number
                                   # %R = new master database cluster path
                                   # %% = '%' character

failover_on_backend_error = on
                                   # Initiates failover when reading/writing to the
                                   # backend communication socket fails
                                   # If set to off, pgpool will report an
                                   # error and disconnect the session.

detach_false_primary = off
                                   # Detach false primary if on. Only
                                   # valid in streaming replicaton
                                   # mode and with PostgreSQL 9.6 or
                                   # after.

search_primary_node_timeout = 300
                                   # Timeout in seconds to search for the
                                   # primary node when a failover occurs.
                                   # 0 means no timeout, keep searching
                                   # for a primary node forever.

#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------

recovery_user = 'postgres'
                                   # Online recovery user
recovery_password = 'File&2018'
                                   # Online recovery password
                                   # Leaving it empty will make Pgpool-II to first look for the
                                   # Password in pool_passwd file before using the empty password

recovery_1st_stage_command = 'recovery_1st_stage'
                                   # Executes a command in first stage
recovery_2nd_stage_command = ''
                                   # Executes a command in second stage
recovery_timeout = 90
                                   # Timeout in seconds to wait for the
                                   # recovering node's postmaster to start up
                                   # 0 means no wait
client_idle_limit_in_recovery = 0
                                   # Client is disconnected after being idle
                                   # for that many seconds in the second stage
                                   # of online recovery
                                   # 0 means no disconnection
                                   # -1 means immediate disconnection


#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -

use_watchdog = on
                                    # Activates watchdog
                                    # (change requires restart)

# -Connection to up stream servers -

trusted_servers = ''
                                    # trusted server list which are used
                                    # to confirm network connection
                                    # (hostA,hostB,hostC,...)
                                    # (change requires restart)
ping_path = '/bin'
                                    # ping command path
                                    # (change requires restart)

# - Watchdog communication Settings -

wd_hostname = '193.185.83.114'
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
wd_priority = 2
                                                                        # priority of this watchdog in leader election
                                                                        # (change requires restart)

wd_authkey = ''
                                    # Authentication key for watchdog communication
                                    # (change requires restart)

wd_ipc_socket_dir = '/tmp'
                                                                        # Unix domain socket path for watchdog IPC socket
                                                                        # The Debian package defaults to
                                                                        # /var/run/postgresql
                                                                        # (change requires restart)


# - Virtual IP control Setting -

delegate_IP = '193.185.83.119'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/usr/sbin/ip_w'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ip_w addr add $_IP_$/32 dev ens192 label ens192:0'
                                    # startup delegate IP command
                                    # (change requires restart)
if_down_cmd = 'ip_w addr del $_IP_$/32 dev ens192'
                                    # shutdown delegate IP command
                                    # (change requires restart)
arping_path = '/usr/sbin/arping_w'
                                    # arping command path
                                    # (change requires restart)
arping_cmd = 'arping_w -U $_IP_$ -w l'
                                    # arping command
                                    # (change requires restart)

# - Behaivor on escalation Setting -

clear_memqcache_on_escalation = on
                                    # Clear all the query cache on shared memory
                                    # when standby pgpool escalate to active pgpool
                                    # (= virtual IP holder).
                                    # This should be off if client connects to pgpool
                                    # not using virtual IP.
                                    # (change requires restart)
wd_escalation_command = ''
                                    # Executes this command at escalation on new active pgpool.
                                    # (change requires restart)
wd_de_escalation_command = ''
                                                                        # Executes this command when master pgpool resigns from being master.
                                                                        # (change requires restart)

# - Watchdog consensus settings for failover -

failover_when_quorum_exists = on
                                                                        # Only perform backend node failover
                                                                        # when the watchdog cluster holds the quorum
                                                                        # (change requires restart)

failover_require_consensus = on
                                                                        # Perform failover when majority of Pgpool-II nodes
                                                                        # aggrees on the backend node status change
                                                                        # (change requires restart)

allow_multiple_failover_requests_from_node = off
                                                                        # A Pgpool-II node can cast multiple votes
                                                                        # for building the consensus on failover
                                                                        # (change requires restart)

# - Lifecheck Setting -

# -- common --

wd_monitoring_interfaces_list = '' # Comma separated list of interfaces names to monitor.
                                                                        # if any interface from the list is active the watchdog will
                                                                        # consider the network is fine
                                                                        # 'any' to enable monitoring on all interfaces except loopback
                                                                        # '' to disable monitoring
                                                                        # (change requires restart)


wd_lifecheck_method = 'heartbeat'
                                    # Method of watchdog lifecheck ('heartbeat' or 'query' or 'external')
                                    # (change requires restart)
wd_interval = 3
                                    # lifecheck interval (sec) > 0
                                    # (change requires restart)

# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)

heartbeat_destination0 = '193.185.83.115'

                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_destination_port0 = 9694
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
heartbeat_device0 = ''
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)

#heartbeat_destination1 = 'host0_ip2'
#heartbeat_destination_port1 = 9694
#heartbeat_device1 = ''

# -- query mode --

wd_life_point = 3
                                    # lifecheck retry times
                                    # (change requires restart)
wd_lifecheck_query = 'SELECT 1'
                                    # lifecheck query to pgpool from watchdog
                                    # (change requires restart)
wd_lifecheck_dbname = 'postgres'
                                    # Database name connected for lifecheck
                                    # (change requires restart)
wd_lifecheck_user = 'postgres'
                                    # watchdog user monitoring pgpools in lifecheck
                                    # (change requires restart)
wd_lifecheck_password = 'postgres'
                                    # Password for watchdog user in lifecheck
                                                                        # Leaving it empty will make Pgpool-II to first look for the
                                                                        # Password in pool_passwd file before using the empty password
                                    # (change requires restart)

# - Other pgpool Connection Settings -

other_pgpool_hostname0 = '193.185.83.115'

                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
other_pgpool_port0 = 5432
                                    # Port number for other pgpool 0
                                    # (change requires restart)
other_wd_port0 = 9000
                                    # Port number for other watchdog 0
                                    # (change requires restart)
#other_pgpool_hostname1 = 'host1'
#other_pgpool_port1 = 5432
#other_wd_port1 = 9000


#------------------------------------------------------------------------------
# OTHERS
#------------------------------------------------------------------------------
relcache_expire = 0
                                   # Life time of relation cache in seconds.
                                   # 0 means no cache expiration(the default).
                                   # The relation cache is used for cache the
                                   # query result against PostgreSQL system
                                   # catalog to obtain various information
                                   # including table structures or if it's a
                                   # temporary table or not. The cache is
                                   # maintained in a pgpool child local memory
                                   # and being kept as long as it survives.
                                   # If someone modify the table by using
                                   # ALTER TABLE or some such, the relcache is
                                   # not consistent anymore.
                                   # For this purpose, cache_expiration
                                   # controls the life time of the cache.

relcache_size = 256
                                   # Number of relation cache
                                   # entry. If you see frequently:
                                   # "pool_search_relcache: cache replacement happend"
                                   # in the pgpool log, you might want to increate this number.

check_temp_table = on
                                   # If on, enable temporary table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/master
                                   # thus increases load of master.
                                   # If you are absolutely sure that your system never uses temporary tables
                                   # and you want to save access to primary/master, you could turn this off.
                                   # Default is on.

check_unlogged_table = on
                                   # If on, enable unlogged table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/master
                                   # thus increases load of master.
                                   # If you are absolutely sure that your system never uses unlogged tables
                                   # and you want to save access to primary/master, you could turn this off.
                                   # Default is on.

#------------------------------------------------------------------------------
# IN MEMORY QUERY MEMORY CACHE
#------------------------------------------------------------------------------
memory_cache_enabled = off
                                                                   # If on, use the memory cache functionality, off by default
                                   # (change requires restart)
memqcache_method = 'shmem'
                                                                   # Cache storage method. either 'shmem'(shared memory) or
                                                                   # 'memcached'. 'shmem' by default
                                   # (change requires restart)
memqcache_memcached_host = 'localhost'
                                                                   # Memcached host name or IP address. Mandatory if
                                                                   # memqcache_method = 'memcached'.
                                                                   # Defaults to localhost.
                                   # (change requires restart)
memqcache_memcached_port = 11211
                                                                   # Memcached port number. Mondatory if memqcache_method = 'memcached'.
                                                                   # Defaults to 11211.
                                   # (change requires restart)
memqcache_total_size = 67108864
                                                                   # Total memory size in bytes for storing memory cache.
                                                                   # Mandatory if memqcache_method = 'shmem'.
                                                                   # Defaults to 64MB.
                                   # (change requires restart)
memqcache_max_num_cache = 1000000
                                                                   # Total number of cache entries. Mandatory
                                                                   # if memqcache_method = 'shmem'.
                                                                   # Each cache entry consumes 48 bytes on shared memory.
                                                                   # Defaults to 1,000,000(45.8MB).
                                   # (change requires restart)
memqcache_expire = 0
                                                                   # Memory cache entry life time specified in seconds.
                                                                   # 0 means infinite life time. 0 by default.
                                   # (change requires restart)
memqcache_auto_cache_invalidation = on
                                                                   # If on, invalidation of query cache is triggered by corresponding
                                                                   # DDL/DML/DCL(and memqcache_expire). If off, it is only triggered
                                                                   # by memqcache_expire. on by default.
                                   # (change requires restart)
memqcache_maxcache = 409600
                                                                   # Maximum SELECT result size in bytes.
                                                                   # Must be smaller than memqcache_cache_block_size. Defaults to 400KB.
                                   # (change requires restart)
memqcache_cache_block_size = 1048576
                                                                   # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.
                                                                   # Defaults to 1MB.
                                   # (change requires restart)
memqcache_oiddir = '/var/log/pgpool/oiddir'
                                                                   # Temporary work directory to record table oids
                                   # (change requires restart)
white_memqcache_table_list = ''
                                   # Comma separated list of table names to memcache
                                   # that don't write to database
                                   # Regexp are accepted
black_memqcache_table_list = ''
                                   # Comma separated list of table names not to memcache
                                   # that don't write to database
                                   # Regexp are accepted
Tagsmaster slave, pcp commands, streaming replication

Activities

ankur

2019-03-19 18:57

reporter   ~0002435

At least part of problem is solved by me but still not able to create the db on node-a and also the pcp problem are not solved:-

[postgres@rollc-filesrvr1 pgpool-II]$ pcp_attach_node -n 1
Password:
pcp_attach_node -- Command Successful


[postgres@rollc-filesrvr1 pgpool-II]$ psql -h 193.185.83.119 -p 5432 -U pgpool postgres
psql (9.2.24, server 10.7)
WARNING: psql version 9.2, server version 10.0.
         Some psql features might not work.
Type "help" for help.

postgres=> show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change
---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0 | 193.185.83.114 | 5438 | up | 0.500000 | primary | 0 | false | 0 | 2019-03-18 22:59:19
 1 | 193.185.83.115 | 5438 | up | 0.500000 | standby | 0 | true | 0 | 2019-03-19 11:38:38
(2 rows)

postgres=> create database test;
ERROR: permission denied to create database
postgres=>

ankur

2019-03-20 21:53

reporter   ~0002439

Now one more problem started after issuing "pcp_attach_node -n 1" on node1.
Now following command started hanging

pcp_watchdog_info -h 193.185.83.119 -p 9898 -U postgres

I confirmed it by replicating it.

Steps to reproduce:
Node1:
postgres running
pgpool running
Node2:
postgres was stopped
pgpool was stopped

I issued the pcp_recovery_node on node1 and as usual pcp hangs but postgres starts on node2.
Now I started pgpool also on node2
I executed "pcp_attach_node -n 1" on node1.
Now i issued watchdog command as above on node1 or 2. It hangs.

ankur

2019-03-20 21:57

reporter   ~0002440

I also confirmed that after keeping postgres+pgpool running overnight on node1 and node2 , when i logged into the machines today, almost after 14 hours, I found the create db running on node1. When i tried to replicate the scenario by again using pcp with pgpool on/off on node2. i found pcp still hangs on node1 . no clue why it worked in morning and not after that

ankur

2019-03-20 22:04

reporter   ~0002441

I have used "pcp_detach_ndoe -n 1" on node1 to try to remove the standby node and found this command also hanged. I confirm that recovery_timeout = 10 now in pgpool.conf and tried to see show pool_nodes; it was still showing standby up.

ankur

2019-03-20 22:25

reporter   ~0002442

I saw at least following was still running in node1's pgpool logs, probably from earlier restart attempt of mine

postgres 23836 23804 0 14:27 ? 00:00:00 /bin/bash /data/test/data/pgpool_remote_start 193.185.83.115 /data/test/data

It was yet not completed. Probably that was stopping completion of pcp commands but not sure.

t-ishii

2019-03-21 00:33

developer   ~0002443

Probably there's something wrong in pgpool_remote_start which should have been installed in master PostgreSQL database cluster (specified by backend_data_directory0). Please share pgpool_remote_start.

Also I need log of master PostgreSQL node while following happend:

 Mar 18 21:11:37 node-a pgpool[16583]: 2019-03-18 21:11:37: pid 8534: LOG: checking if postmaster is started
    Mar 18 21:11:37 node-a pgpool[16583]: 2019-03-18 21:11:37: pid 8534: DETAIL: trying to connect to postmaster on hostname:node-b-ip database:postgres user:postgres (retry 0 times)

Function pgpool_remote_start() is executed by Pgpool-II against master PostgreSQL node.

ankur

2019-03-21 04:46

reporter   ~0002444

Thanks t-ishii.
here is pgpool_remote_start attached with this notes.
Also I found following line every-time in ps -ef | grep pgpool on node-a

postgres 20868 20851 0 17:27 ? 00:00:00 /bin/bash /data/test/data/pgpool_remote_start 193.185.83.115 /data/test/data

My understanding was ( could be is , but I am not convinced yet ) that pgpool_remote_start is to start remote postgresql as the master (node-a) is already running with master status and I am merely building the standby with pcp command. You may correct me on this, please.

ankur

2019-03-21 04:46

reporter  

pgpool_remote_start.sh (636 bytes)

ankur

2019-03-21 04:53

reporter   ~0002445

Here. 193.185.83.115 is node-b and 193.185.83.114 is node-a.
Attached are the logs of node-a at the moment of 21.10-21.12

needed_logs.log (3,566 bytes)

ankur

2019-03-21 04:55

reporter   ~0002446

Indeed. In all of above scenario's pgpool_remote_start is available in both master and then in standby after base_backup command succeeds. Also, I am executing pgpool_remote_start which merely restarts the standby with ssh.

t-ishii

2019-03-21 08:11

developer   ~0002447

I usually redirect stdin/stderr in and out to /dev/null when executing ssh. I put them in background something like this:

#! /bin/sh
#
# start postmaster on the recoveried node
#
if [ $# -ne 2 ]
then
    echo "pgpool_remote_start remote_host remote_datadir"
    exit 1
fi

DEST=$1
DESTDIR=$2
PGCTL=/usr/local/pgsql/bin/pg_ctl

ssh -T $DEST $PGCTL -w -D $DESTDIR start 2>/dev/null 1>/dev/null < /dev/null &

Maybe your ssh command is blocked because of this?

t-ishii

2019-03-21 08:28

developer   ~0002448

BTW, here is a more production level Pgpool-II configuration example:
http://www.pgpool.net/docs/latest/en/html/example-cluster.html
Please take a look at the pgpool_remote_start script example.

ankur

2019-03-21 18:20

reporter   ~0002449

Thank you , it solved the problem.
It was not pgpool problem but the incorrect cmd line of mine which was stopping the pgpool_remote_start from completion and further other commands.

Issue History

Date Modified Username Field Change
2019-03-19 18:30 ankur New Issue
2019-03-19 18:30 ankur Tag Attached: master slave
2019-03-19 18:32 ankur Tag Attached: pcp commands
2019-03-19 18:32 ankur Tag Attached: streaming replication
2019-03-19 18:57 ankur Note Added: 0002435
2019-03-20 14:41 pengbo Assigned To => t-ishii
2019-03-20 14:41 pengbo Status new => assigned
2019-03-20 21:53 ankur Note Added: 0002439
2019-03-20 21:57 ankur Note Added: 0002440
2019-03-20 22:04 ankur Note Added: 0002441
2019-03-20 22:25 ankur Note Added: 0002442
2019-03-21 00:33 t-ishii Note Added: 0002443
2019-03-21 00:34 t-ishii Status assigned => feedback
2019-03-21 04:46 ankur Note Added: 0002444
2019-03-21 04:46 ankur Status feedback => assigned
2019-03-21 04:46 ankur File Added: pgpool_remote_start.sh
2019-03-21 04:53 ankur File Added: needed_logs.log
2019-03-21 04:53 ankur Note Added: 0002445
2019-03-21 04:55 ankur Note Added: 0002446
2019-03-21 08:11 t-ishii Note Added: 0002447
2019-03-21 08:28 t-ishii Note Added: 0002448
2019-03-21 08:28 t-ishii Status assigned => feedback
2019-03-21 18:20 ankur Note Added: 0002449
2019-03-21 18:20 ankur Status feedback => assigned
2019-03-21 19:08 t-ishii Status assigned => resolved