View Issue Details

IDProjectCategoryView StatusLast Update
0000476Pgpool-IIBugpublic2019-04-05 08:58
ReportersivaAssigned Tohoshiai 
PriorityurgentSeveritymajorReproducibilityhave not tried
Status closedResolutionno change required 
Product Version3.7.7 
Target VersionFixed in Version 
Summary0000476: Pgpool Master and Slaves are not properly sync the backends DB status..Pool Mater and Slaves are showing fault backend status..
DescriptionHi ,

Good Morning.

We are using pgpool 3.7.7 server for load balancing with Master and Slave setup and repmgr 4.2 for replication and failover. Total 4 DB backend servers ( 1 Master and 3 Slaves)..

Regarding to status of backend DB servers:
as per repmgr status everything as per expectation , but checking status from pgpool master and pgpool slave getting different status ;

repmgr status on master:

Status on Database Master Node:
-bash-4.2$ /usr/pgsql-9.5/bin/repmgr -f /var/lib/pgsql/repmgr/repmgr.conf cluster show
WARNING: master_response_timeout/9: unknown name/value pair provided; ignoring
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
 ID | Name | Role | Status | Upstream | Location | Connection string
----+--------------+---------+-----------+--------------+----------+----------------------------------------------
 1 | Master_DB_IP | primary | * running | | default | host= Master_DB_IP user=repmgr dbname=repmgr
 2 | Slave_DB_IP1 | standby | running | Master_DB_IP | default | host= Slave_DB_IP1 user=repmgr dbname=repmgr
 3 | Slave_DB_IP2 | standby | running | Master_DB_IP | default | host= Slave_DB_IP2 user=repmgr dbname=repmgr
 4 | Slave_DB_IP3 | standby | running | Master_DB_IP | default | host= Slave_DB_IP3 user=repmgr dbname=repmgr

on Hostname1 (pgpool master Node status):
[root@Hostname1 ~]# systemctl status pgpool
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: perhaps failed to create INET domain socket
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol"
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: pgpool-II successfully started. version 3.7.7 (amefuriboshi)
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: FATAL: failed to create watchdog heartbeat receive socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: FATAL: failed to create watchdog heartbeat sender socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: LOG: successfully acquired the delegate IP:"VIP Address"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: DETAIL: 'if_up_cmd' returned with success
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5358: LOG: watchdog escalation process with pid: 5377 exit with SUCCESS.

[root@Hostname1 ~]# su - postgres
-bash-4.2$ psql -U pgpool --host Hostname1 --dbname postgres -c "show pool_nodes"
 node_id | Hostname1 | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+--------------+------+--------+-----------+--------+------------+-------------------+-------------------
 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 1 | backend_host1_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 2 | backend_host2_IP | backen_port | up | 0.250000 | master | 1 | true | 0
 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
(4 rows)

on Hostname2 (pgpool slave Node status):
[root@Hostname2 ~]# systemctl status pgpool
on Hostname2 (pgpool slave Node):
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: WARNING: checking setuid bit of arping command
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: DETAIL: arping[/usr/sbin/arping] doesn t have setuid bit
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: LOG: reading status file: 0 th backend is set to down status
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: LOG: waiting for watchdog to initialize
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: setting the local watchdog node name to "pgpool_slave_host_IP:pool_port Linux Hostname2"
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: watchdog cluster is configured with 1 remote nodes
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: watchdog remote node:0 on pgpool_master_host_IP:watchdog_port
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: interface monitoring is disabled in watchdog
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4350: LOG: watchdog node state changed from [DEAD] to [LOADING]
Mar 19 21:42:17 Hostname2 pgpool[4349]: 2019-03-19 21:42:17: pid 4350: LOG: watchdog node state changed from [LOADING] to [JOINING]

[root@Hostname2 ~]# su - postgres
Last login: Tue Mar 19 21:34:19 IST 2019 on pts/0
-bash-4.2$ psql -U pgpool --host Hostname2 --dbname postgres -c "show pool_nodes"
 node_id | Hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+--------------+------+--------+-----------+--------+------------+-------------------+-------------------
 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 1 | backend_host1_IP | backen_port | up | 0.250000 | master | 0 | true | 0
 2 | backend_host2_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
(4 rows)

Could you please help to resolve this status mismatch between pgpool master , slave and Databases.Please share me if already have any bug fix which related to this issues if already addressed.

Thank you very much for your help.
Additional Informationon Hostname1 (pgpool master Node status):

[root@Hostname1 ~]# systemctl status pgpool
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: perhaps failed to create INET domain socket
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol"
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: pgpool-II successfully started. version 3.7.7 (amefuriboshi)
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: FATAL: failed to create watchdog heartbeat receive socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: FATAL: failed to create watchdog heartbeat sender socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: LOG: successfully acquired the delegate IP:"VIP Address"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: DETAIL: 'if_up_cmd' returned with success
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5358: LOG: watchdog escalation process with pid: 5377 exit with SUCCESS.

Please help us to address above errors (pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol") , (pid 5382: FATAL: failed to create watchdog heartbeat receive socket),(pid 5383: FATAL: failed to create watchdog heartbeat sender socket),(pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device")
TagsNo tags attached.

Activities

hoshiai

2019-03-22 16:17

developer   ~0002450

I think that it maybe failed to connect part of pgpool-pgpool and pgpool-postgres,
because of incollect settings.

> (pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol") ,
This is probably not a problem.

> (pid 5382: FATAL: failed to create watchdog heartbeat receive socket),
> (pid 5383: FATAL: failed to create watchdog heartbeat sender socket),
> (pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "・LONo such device")

They are failed to create and set socket for heartbeat among pgpool.
Do you set the heartbeat_* parameters in pgpool.conf correctly?
For example, if heartbeat_device parameter specified incorrectly, it may well happen.

Could you show pgpool.conf and pgpool's logfile?

siva

2019-03-24 13:29

reporter   ~0002451

Hi Hoshial,

thanks for your reply.

Could you please clarify few things.

1. Status Mismatch between postgres server and master pool server
[root@Hostname1 ~]# su - postgres
-bash-4.2$ psql -U pgpool --host Hostname1 --dbname postgres -c "show pool_nodes"
 node_id | Hostname1 | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+--------------+------+--------+-----------+--------+------------+-------------------+-------------------
 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 1 | backend_host1_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 2 | backend_host2_IP | backen_port | up | 0.250000 | master | 1 | true | 0
 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
(4 rows)

Please let me know the reason why it has been showing wrong status?. What is the fix for this to get status properly between psotgres and pgpool

2. Mismatch status between pool to pool:
[root@Hostname2 ~]# su - postgres
Last login: Tue Mar 19 21:34:19 IST 2019 on pts/0
-bash-4.2$ psql -U pgpool --host Hostname2 --dbname postgres -c "show pool_nodes"
 node_id | Hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+--------------+------+--------+-----------+--------+------------+-------------------+-------------------
 0 | backend_host0_IP | backen_port | down | 0.250000 | slave | 0 | false | 0
 1 | backend_host1_IP | backen_port | up | 0.250000 | master | 0 | true | 0
 2 | backend_host2_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
 3 | backend_host3_IP | backen_port | up | 0.250000 | slave | 0 | false | 0
(4 rows)

Even between pool to pool also, there is no sync between them , please help to understand to overcome these issues.

3. setuid issue:
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: WARNING: checking setuid bit of arping command
Mar 19 21:42:12 Hostname2 pgpool[4349]: 2019-03-19 21:42:12: pid 4349: DETAIL: arping[/usr/sbin/arping] doesn t have setuid bit

How to fix above setuid issues?.

4. No device found
> (pid 5382: FATAL: failed to create watchdog heartbeat receive socket),
> (pid 5383: FATAL: failed to create watchdog heartbeat sender socket),
> (pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "・LONo such device")


Please find requested details.

Log:

[root@Hostname1 ~]# systemctl status pgpool
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: perhaps failed to create INET domain socket
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: DETAIL: socket(::) failed: "Address family not supported by protocol"
Mar 19 21:40:14 Hostname1 pgpool[5357]: 2019-03-19 21:40:14: pid 5357: LOG: pgpool-II successfully started. version 3.7.7 (amefuriboshi)
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: FATAL: failed to create watchdog heartbeat receive socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5382: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: FATAL: failed to create watchdog heartbeat sender socket
Mar 19 21:40:15 Hostname1 pgpool[5357]: 2019-03-19 21:40:15: pid 5383: DETAIL: setsockopt(SO_BINDTODEVICE) failed with reason: "No such device"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: LOG: successfully acquired the delegate IP:"VIP Address"
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5377: DETAIL: 'if_up_cmd' returned with success
Mar 19 21:40:18 Hostname1 pgpool[5357]: 2019-03-19 21:40:18: pid 5358: LOG: watchdog escalation process with pid: 5377 exit with SUCCESS.

Thanks for you help.

Regards,
Siva.

Master_Pool_config_file.txt (40,853 bytes)
[root@N1PPRL-UFGA0160 pgpool-II]# cat pgpool.conf
# ----------------------------
# pgPool-II configuration file
# ----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# Whitespace may be used.  Comments are introduced with "#" anywhere on a line.
# The complete list of parameter names and allowed values can be found in the
# pgPool-II documentation.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pgpool reload".  Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#


#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------

# - pgpool Connection Settings -

listen_addresses = '*'
                                   # Host name or IP address to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
port = 5432
                                   # Port number
                                   # (change requires restart)
socket_dir = '/var/run/postgresql'
                                   # Unix domain socket path
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)
listen_backlog_multiplier = 2
                                   # Set the backlog parameter of listen(2) to
                                                                   # num_init_children * listen_backlog_multiplier.
                                   # (change requires restart)
serialize_accept = off
                                   # whether to serialize accept() call to avoid thundering herd problem
                                   # (change requires restart)

# - pgpool Communication Manager Connection Settings -

pcp_listen_addresses = '*'
                                   # Host name or IP address for pcp process to listen on:
                                   # '*' for all, '' for no TCP/IP connections
                                   # (change requires restart)
pcp_port = 9898
                                   # Port number for pcp
                                   # (change requires restart)
pcp_socket_dir = '/var/run/postgresql'
                                   # Unix domain socket path for pcp
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)

# - Backend Connection Settings -

backend_hostname0 = 'backen_ip0'
                                   # Host name or IP address to connect to for backend 0
backend_port0 = 5432
                                   # Port number for backend 0
backend_weight0 = 1
                                   # Weight for backend 0 (only in load balancing mode)
backend_data_directory0 = '/pgsql_prod/9.5/data'
                                   # Data directory for backend 0
backend_flag0 = 'ALLOW_TO_FAILOVER'
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER, DISALLOW_TO_FAILOVER
                                   # or ALWAYS_MASTER
backend_hostname1 = 'backen_ip1'
backend_port1 = backen_port
backend_weight1 = 1
backend_data_directory1 = '/pgsql_prod/9.5/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = 'backen_ip2'
backend_port2 = backen_port
backend_weight2 = 1
backend_data_directory2 = '/pgsql_prod/9.5/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
backend_hostname3 = 'backen_ip3'
backend_port3 = backen_port
backend_weight3 = 1
backend_data_directory3 = '/pgsql_prod/9.5/data'
backend_flag3 = 'ALLOW_TO_FAILOVER'


# - Authentication -

enable_pool_hba = on
                                   # Use pool_hba.conf for client authentication
pool_passwd = 'pool_passwd'
                                   # File name of pool_passwd for md5 authentication.
                                   # "" disables pool_passwd.
                                   # (change requires restart)
authentication_timeout = 60
                                   # Delay in seconds to complete client authentication
                                   # 0 means no timeout.

# - SSL Connections -

ssl = off
                                   # Enable SSL support
                                   # (change requires restart)
#ssl_key = './server.key'
                                   # Path to the SSL private key file
                                   # (change requires restart)
#ssl_cert = './server.cert'
                                   # Path to the SSL public certificate file
                                   # (change requires restart)
#ssl_ca_cert = ''
                                   # Path to a single PEM format file
                                   # containing CA root certificate(s)
                                   # (change requires restart)
#ssl_ca_cert_dir = ''
                                   # Directory containing CA root certificate(s)
                                   # (change requires restart)


#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------

# - Concurrent session and pool size -

num_init_children = 125
                                   # Number of concurrent sessions allowed
                                   # (change requires restart)
max_pool = 8
                                   # Number of connection pool caches per connection
                                   # (change requires restart)

# - Life time -

child_life_time = 300
                                   # Pool exits after being idle for this many seconds
child_max_connections = 0
                                   # Pool exits after receiving that many connections
                                   # 0 means no exit
connection_life_time = 0
                                   # Connection to backend closes after being idle for this many seconds
                                   # 0 means no close
client_idle_limit = 0
                                   # Client is disconnected after being idle for that many seconds
                                   # (even inside an explicit transactions!)
                                   # 0 means no disconnection


#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------

# - Where to log -

log_destination = 'stderr'
                                   # Where to log
                                   # Valid values are combinations of stderr,
                                   # and syslog. Default to stderr.

# - What to log -

log_line_prefix = '%t: pid %p: '   # printf-style string to output at beginning of each log line.

log_connections = on
                                   # Log connections
log_hostname = on
                                   # Hostname will be shown in ps status
                                   # and in logs if connections are logged
log_statement = on
                                   # Log all statements
log_per_node_statement = off
                                   # Log all statements
                                   # with node and backend informations
log_standby_delay = 'none'
                                   # Log standby delay
                                   # Valid values are combinations of always,
                                   # if_over_threshold, none

# - Syslog specific -

syslog_facility = 'LOCAL0'
                                   # Syslog local facility. Default to LOCAL0
syslog_ident = 'pgpool'
                                   # Syslog program identification string
                                   # Default to 'pgpool'

# - Debug -

#log_error_verbosity = default          # terse, default, or verbose messages

#client_min_messages = notice           # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   log
                                        #   notice
                                        #   warning
                                        #   error

#log_min_messages = warning             # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   info
                                        #   notice
                                        #   warning
                                        #   error
                                        #   log
                                        #   fatal
                                        #   panic

#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------

pid_file_name = '/var/run/pgpool/pgpool.pid'
                                   # PID file name
                                   # Can be specified as relative to the"
                                   # location of pgpool.conf file or
                                   # as an absolute path
                                   # (change requires restart)
logdir = '/var/log/pgpool'
                                   # Directory of pgPool status file
                                   # (change requires restart)


#------------------------------------------------------------------------------
# CONNECTION POOLING
#------------------------------------------------------------------------------

connection_cache = on
                                   # Activate connection pools
                                   # (change requires restart)

                                   # Semicolon separated list of queries
                                   # to be issued at the end of a session
                                   # The default is for 8.3 and later
reset_query_list = 'ABORT; DISCARD ALL'
                                   # The following one is for 8.2 and before
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'


#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------

replication_mode = off
                                   # Activate replication mode
                                   # (change requires restart)
replicate_select = off
                                   # Replicate SELECT statements
                                   # when in replication mode
                                   # replicate_select is higher priority than
                                   # load_balance_mode.

insert_lock = on
                                   # Automatically locks a dummy row or a table
                                   # with INSERT statements to keep SERIAL data
                                   # consistency
                                   # Without SERIAL, no lock will be issued
lobj_lock_table = ''
                                   # When rewriting lo_creat command in
                                   # replication mode, specify table name to
                                   # lock

# - Degenerate handling -

replication_stop_on_mismatch = off
                                   # On disagreement with the packet kind
                                   # sent from backend, degenerate the node
                                   # which is most likely "minority"
                                   # If off, just force to exit this session

failover_if_affected_tuples_mismatch = off
                                   # On disagreement with the number of affected
                                   # tuples in UPDATE/DELETE queries, then
                                   # degenerate the node which is most likely
                                   # "minority".
                                   # If off, just abort the transaction to
                                   # keep the consistency


#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------

load_balance_mode = on
                                   # Activate load balancing mode
                                   # (change requires restart)
ignore_leading_white_space = on
                                   # Ignore leading white spaces of each query
white_function_list = ''
                                   # Comma separated list of function names
                                   # that don't write to database
                                   # Regexp are accepted
black_function_list = 'nextval,setval,nextval,setval'
                                   # Comma separated list of function names
                                   # that write to database
                                   # Regexp are accepted

database_redirect_preference_list = ''
                                                                   # comma separated list of pairs of database and node id.
                                                                   # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2'
                                                                   # valid for streaming replicaton mode only.

app_name_redirect_preference_list = ''
                                                                   # comma separated list of pairs of app name and node id.
                                                                   # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby'
                                                                   # valid for streaming replicaton mode only.
allow_sql_comments = off
                                                                   # if on, ignore SQL comments when judging if load balance or
                                                                   # query cache is possible.
                                                                   # If off, SQL comments effectively prevent the judgment
                                                                   # (pre 3.4 behavior).

#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------

master_slave_mode = off
                                   # Activate master/slave mode
                                   # (change requires restart)
master_slave_sub_mode = 'stream'
                                   # Master/slave sub mode
                                   # Valid values are combinations stream, slony
                                   # or logical. Default is stream.
                                   # (change requires restart)

# - Streaming -

sr_check_period = 0
                                   # Streaming replication check period
                                   # Disabled (0) by default
sr_check_user = 'pgpool'
                                   # Streaming replication check user
                                   # This is necessary even if you disable
                                   # streaming replication delay check with
                                   # sr_check_period = 0
sr_check_password = ''
                                   # Password for streaming replication check user
sr_check_database = 'postgres'
                                   # Database name for streaming replication check
delay_threshold = 0
                                   # Threshold before not dispatching query to standby node
                                   # Unit is in bytes
                                   # Disabled (0) by default

# - Special commands -

follow_master_command = ''
                                   # Executes this command after master failover
                                   # Special values:
                                   #   %d = node id
                                   #   %h = host name
                                   #   %p = port number
                                   #   %D = database cluster path
                                   #   %m = new master node id
                                   #   %H = hostname of the new master node
                                   #   %M = old master node id
                                   #   %P = old primary node id
                                   #   %r = new master port number
                                   #   %R = new master database cluster path
                                   #   %% = '%' character

#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS
#------------------------------------------------------------------------------

health_check_period = 20
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'pgpool'
                                   # Health check user
health_check_password = ''
                                   # Password for health check user
health_check_database = ''
                                   # Database name for health check. If '', tries 'postgres' frist, then 'template1'

health_check_max_retries = 0
                                   # Maximum number of times to retry a failed health check before giving up.
health_check_retry_delay = 1
                                   # Amount of time to wait (in seconds) between retries.
connect_timeout = 10000
                                   # Timeout value in milliseconds before giving up to connect to backend.
                                                                   # Default is 10000 ms (10 second). Flaky network user may want to increase
                                                                   # the value. 0 means no timeout.
                                                                   # Note that this value is not only used for health check,
                                                                   # but also for ordinary conection to backend.

#------------------------------------------------------------------------------
# HEALTH CHECK PER NODE PARAMETERS (OPTIONAL)
#------------------------------------------------------------------------------
#health_check_period0 = 0
#health_check_timeout0 = 20
#health_check_user0 = 'nobody'
#health_check_password0 = ''
#health_check_database0 = ''
#health_check_max_retries0 = 0
#health_check_retry_delay0 = 1
#connect_timeout0 = 10000

#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = ''
                                   # Executes this command at failover
                                   # Special values:
                                   #   %d = node id
                                   #   %h = host name
                                   #   %p = port number
                                   #   %D = database cluster path
                                   #   %m = new master node id
                                   #   %H = hostname of the new master node
                                   #   %M = old master node id
                                   #   %P = old primary node id
                                   #   %r = new master port number
                                   #   %R = new master database cluster path
                                   #   %% = '%' character
failback_command = ''
                                   # Executes this command at failback.
                                   # Special values:
                                   #   %d = node id
                                   #   %h = host name
                                   #   %p = port number
                                   #   %D = database cluster path
                                   #   %m = new master node id
                                   #   %H = hostname of the new master node
                                   #   %M = old master node id
                                   #   %P = old primary node id
                                   #   %r = new master port number
                                   #   %R = new master database cluster path
                                   #   %% = '%' character

fail_over_on_backend_error = on
                                   # Initiates failover when reading/writing to the
                                   # backend communication socket fails
                                   # If set to off, pgpool will report an
                                   # error and disconnect the session.

search_primary_node_timeout = 300
                                   # Timeout in seconds to search for the
                                   # primary node when a failover occurs.
                                   # 0 means no timeout, keep searching
                                   # for a primary node forever.

#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------

recovery_user = 'pgpool'
                                   # Online recovery user
recovery_password = ''
                                   # Online recovery password
recovery_1st_stage_command = 'basebackup.sh'
                                   # Executes a command in first stage
recovery_2nd_stage_command = ''
                                   # Executes a command in second stage
recovery_timeout = 90
                                   # Timeout in seconds to wait for the
                                   # recovering node's postmaster to start up
                                   # 0 means no wait
client_idle_limit_in_recovery = 0
                                   # Client is disconnected after being idle
                                   # for that many seconds in the second stage
                                   # of online recovery
                                   # 0 means no disconnection
                                   # -1 means immediate disconnection


#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -

use_watchdog = on
                                    # Activates watchdog
                                    # (change requires restart)

# -Connection to up stream servers -

trusted_servers = 'backen_ip0,backen_ip1,backen_ip2,backen_ip3'
                                    # trusted server list which are used
                                    # to confirm network connection
                                    # (hostA,hostB,hostC,...)
                                    # (change requires restart)
ping_path = '/bin'
                                    # ping command path
                                    # (change requires restart)

# - Watchdog communication Settings -

wd_hostname = 'wd_hostname_ip'
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
wd_priority = 1
                                                                        # priority of this watchdog in leader election
                                                                        # (change requires restart)

wd_authkey = ''
                                    # Authentication key for watchdog communication
                                    # (change requires restart)

wd_ipc_socket_dir = '/var/run/postgresql'
                                                                        # Unix domain socket path for watchdog IPC socket
                                                                        # The Debian package defaults to
                                                                        # /var/run/postgresql
                                                                        # (change requires restart)


# - Virtual IP control Setting -

delegate_IP = 'delegate_IP'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ip addr add  delegate_IP/24 dev ens192 label ens192:0'
                                    # startup delegate IP command
                                    # (change requires restart)
if_down_cmd = 'ip addr del delegate_IP/24 dev ens192'
                                    # shutdown delegate IP command
                                    # (change requires restart)
arping_path = '/usr/sbin'
                                    # arping command path
                                    # (change requires restart)
arping_cmd = 'arping -U $_IP_$ -w 1'
                                    # arping command
                                    # (change requires restart)

# - Behaivor on escalation Setting -

clear_memqcache_on_escalation = on
                                    # Clear all the query cache on shared memory
                                    # when standby pgpool escalate to active pgpool
                                    # (= virtual IP holder).
                                    # This should be off if client connects to pgpool
                                    # not using virtual IP.
                                    # (change requires restart)
wd_escalation_command = ''
                                    # Executes this command at escalation on new active pgpool.
                                    # (change requires restart)
wd_de_escalation_command = ''
                                                                        # Executes this command when master pgpool resigns from being master.
                                                                        # (change requires restart)

# - Watchdog consensus settings for failover -

failover_when_quorum_exists = on
                                                                        # Only perform backend node failover
                                                                        # when the watchdog cluster holds the quorum
                                                                        # (change requires restart)

failover_require_consensus = on
                                                                        # Perform failover when majority of Pgpool-II nodes
                                                                        # aggrees on the backend node status change
                                                                        # (change requires restart)

allow_multiple_failover_requests_from_node = off
                                                                        # A Pgpool-II node can cast multiple votes
                                                                        # for building the consensus on failover
                                                                        # (change requires restart)

# - Lifecheck Setting -

# -- common --

wd_monitoring_interfaces_list = ''  # Comma separated list of interfaces names to monitor.
                                                                        # if any interface from the list is active the watchdog will
                                                                        # consider the network is fine
                                                                        # 'any' to enable monitoring on all interfaces except loopback
                                                                        # '' to disable monitoring
                                                                        # (change requires restart)


wd_lifecheck_method = 'heartbeat'
                                    # Method of watchdog lifecheck ('heartbeat' or 'query' or 'external')
                                    # (change requires restart)
wd_interval = 10
                                    # lifecheck interval (sec) > 0
                                    # (change requires restart)

# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)
heartbeat_destination0 = 'heartbeat_destination0_ip'
                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_destination_port0 = 9694
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
heartbeat_device0 = 'eth0'
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)

#heartbeat_destination1 = 'host0_ip2'
#heartbeat_destination_port1 = 9694
#heartbeat_device1 = ''

# -- query mode --

wd_life_point = 3
                                    # lifecheck retry times
                                    # (change requires restart)
wd_lifecheck_query = 'SELECT 1'
                                    # lifecheck query to pgpool from watchdog
                                    # (change requires restart)
wd_lifecheck_dbname = 'template1'
                                    # Database name connected for lifecheck
                                    # (change requires restart)
wd_lifecheck_user = 'nobody'
                                    # watchdog user monitoring pgpools in lifecheck
                                    # (change requires restart)
wd_lifecheck_password = ''
                                    # Password for watchdog user in lifecheck
                                    # (change requires restart)

# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'other_pgpool_hostname0'
                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
other_pgpool_port0 = other_pgpool_port0
                                    # Port number for other pgpool 0
                                    # (change requires restart)
other_wd_port0 = 9000
                                    # Port number for other watchdog 0
                                    # (change requires restart)
#other_pgpool_hostname1 = 'host1'
#other_pgpool_port1 = other_pgpool_port0
#other_wd_port1 = 9000


#------------------------------------------------------------------------------
# OTHERS
#------------------------------------------------------------------------------
relcache_expire = 0
                                   # Life time of relation cache in seconds.
                                   # 0 means no cache expiration(the default).
                                   # The relation cache is used for cache the
                                   # query result against PostgreSQL system
                                   # catalog to obtain various information
                                   # including table structures or if it's a
                                   # temporary table or not. The cache is
                                   # maintained in a pgpool child local memory
                                   # and being kept as long as it survives.
                                   # If someone modify the table by using
                                   # ALTER TABLE or some such, the relcache is
                                   # not consistent anymore.
                                   # For this purpose, cache_expiration
                                   # controls the life time of the cache.

relcache_size = 256
                                   # Number of relation cache
                                   # entry. If you see frequently:
                                   # "pool_search_relcache: cache replacement happend"
                                   # in the pgpool log, you might want to increate this number.

check_temp_table = on
                                   # If on, enable temporary table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/master
                                   # thus increases load of master.
                                   # If you are absolutely sure that your system never uses temporary tables
                                   # and you want to save access to primary/master, you could turn this off.
                                   # Default is on.

check_unlogged_table = on
                                   # If on, enable unlogged table check in SELECT statements.
                                   # This initiates queries against system catalog of primary/master
                                   # thus increases load of master.
                                   # If you are absolutely sure that your system never uses unlogged tables
                                   # and you want to save access to primary/master, you could turn this off.
                                   # Default is on.

#------------------------------------------------------------------------------
# IN MEMORY QUERY MEMORY CACHE
#------------------------------------------------------------------------------
memory_cache_enabled = off
                                                                   # If on, use the memory cache functionality, off by default
memqcache_method = 'shmem'
                                                                   # Cache storage method. either 'shmem'(shared memory) or
                                                                   # 'memcached'. 'shmem' by default
                                   # (change requires restart)
memqcache_memcached_host = 'localhost'
                                                                   # Memcached host name or IP address. Mandatory if
                                                                   # memqcache_method = 'memcached'.
                                                                   # Defaults to localhost.
                                   # (change requires restart)
memqcache_memcached_port = 11211
                                                                   # Memcached port number. Mondatory if memqcache_method = 'memcached'.
                                                                   # Defaults to 11211.
                                   # (change requires restart)
memqcache_total_size = 67108864
                                                                   # Total memory size in bytes for storing memory cache.
                                                                   # Mandatory if memqcache_method = 'shmem'.
                                                                   # Defaults to 64MB.
                                   # (change requires restart)
memqcache_max_num_cache = 1000000
                                                                   # Total number of cache entries. Mandatory
                                                                   # if memqcache_method = 'shmem'.
                                                                   # Each cache entry consumes 48 bytes on shared memory.
                                                                   # Defaults to 1,000,000(45.8MB).
                                   # (change requires restart)
memqcache_expire = 0
                                                                   # Memory cache entry life time specified in seconds.
                                                                   # 0 means infinite life time. 0 by default.
                                   # (change requires restart)
memqcache_auto_cache_invalidation = on
                                                                   # If on, invalidation of query cache is triggered by corresponding
                                                                   # DDL/DML/DCL(and memqcache_expire).  If off, it is only triggered
                                                                   # by memqcache_expire.  on by default.
                                   # (change requires restart)
memqcache_maxcache = 409600
                                                                   # Maximum SELECT result size in bytes.
                                                                   # Must be smaller than memqcache_cache_block_size. Defaults to 400KB.
                                   # (change requires restart)
memqcache_cache_block_size = 1048576
                                                                   # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.
                                                                   # Defaults to 1MB.
                                   # (change requires restart)
memqcache_oiddir = '/var/log/pgpool/oiddir'
                                                                   # Temporary work directory to record table oids
                                   # (change requires restart)
white_memqcache_table_list = ''
                                   # Comma separated list of table names to memcache
                                   # that don't write to database
                                   # Regexp are accepted
black_memqcache_table_list = ''
                                   # Comma separated list of table names not to memcache
                                   # that don't write to database
                                   # Regexp are accepted

hoshiai

2019-03-25 15:15

developer   ~0002453

> 1. Status Mismatch between postgres server and master pool server

If Pgpool-II have already judged a node for down status once, it will not check again.
So, you need to do that this node is reattached to the Pgpool-II by manually.

for the way for example,
  you can use 'pcp_attach_node' command
or
  shutdown Pgpool-II and delete 'pgpool_status' file in a directory specified 'log_dir' paramter, and then restart Pgpool-II.


> 2. Mismatch status between pool to pool:

The reason of no sync may be that connection failed between pgpool.

Please check pgpool's status to use pcp_watchdog_info command for each Pgpool's server.

for example:
  pcp_watchdog_info -v -h IP_ADRESS -p PCP_PORT -U USERNAME

* pcp_watchdog_info command
http://www.pgpool.net/docs/pgpool-II-3.7.7/en/html/pcp-watchdog-info.html


> 3. setuid issue:

if_up_cmd, if_down_cmd and arping_cmd need administrator authority.
You have to set setuid(use 'chmod u+s') or sudo without password, to execute their command by Pgpool.

http://www.pgpool.net/docs/pgpool-II-3.7.7/ja/html/tutorial-watchdog-intro.html#TUTORIAL-WATCHDOG-AUTOMATIC-VIP

> 4. No device found

Probably, heartbeat setting are wrong.
For example, does the 'eth0' device specified by your conf exist on the server?
You can check it using 'ip a' command.

siva

2019-03-25 15:55

reporter   ~0002454

> 1. Status Mismatch between postgres server and master pool server

If Pgpool-II have already judged a node for down status once, it will not check again.
So, you need to do that this node is reattached to the Pgpool-II by manually.

for the way for example,
  you can use 'pcp_attach_node' command
or
  shutdown Pgpool-II and delete 'pgpool_status' file in a directory specified 'log_dir' paramter, and then restart Pgpool-II.

-----------
shutdown Pgpool-II and delete 'pgpool_status' file in a directory specified 'log_dir' paramter, and then restart Pgpool-II. ==> we have done this one , after restarting it is fine but again after 6 hours again getting wrong update. From the repmgr status , it is showing right update.




> 2. Mismatch status between pool to pool:

The reason of no sync may be that connection failed between pgpool.

Please check pgpool's status to use pcp_watchdog_info command for each Pgpool's server.

for example:
  pcp_watchdog_info -v -h IP_ADRESS -p PCP_PORT -U USERNAME

* pcp_watchdog_info command
http://www.pgpool.net/docs/pgpool-II-3.7.7/en/html/pcp-watchdog-info.html


> 3. setuid issue:

if_up_cmd, if_down_cmd and arping_cmd need administrator authority.
You have to set setuid(use 'chmod u+s') or sudo without password, to execute their command by Pgpool.

http://www.pgpool.net/docs/pgpool-II-3.7.7/ja/html/tutorial-watchdog-intro.html#TUTORIAL-WATCHDOG-AUTOMATIC-VIP

============
All starting and stopping done by using root user only . But still getting above issue

> 4. No device found

Probably, heartbeat setting are wrong.
For example, does the 'eth0' device specified by your conf exist on the server?
You can check it using 'ip a' command.
=========

I will check this one. Is this cause any issues? like status mismatch between pools or status mismatch between DB servers to pgpool?

=======
Status mismatch is re occurring every day.. Could you please help us to permanent fix for this issue..

Thanks a lot for your help.

hoshiai

2019-03-25 16:56

developer   ~0002455

> All starting and stopping done by using root user only . But still getting above issue
I see, so this warning messages is no problem.

Could you share pgpool.log (in '/var/log/pgpool') and postgresql.log of each server?
I could not specified a cause by only part of log. And I want to confirm the result of pcp_watchdog_info command .

siva

2019-03-25 23:51

reporter   ~0002459

Thanks for your help hoshiai ..
I will update the status

siva

2019-03-27 23:44

reporter   ~0002469

Hi Hoshiai,

Please find watchdog info of both pgpool master and slave servers:

[root@N1PPRL-UFGA0159 pgpool-II]# pcp_watchdog_info -h 10.222.164.81 -p 9898 -U pgpool -v
Password:
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM IS ON THE EDGE
Alive Remote Nodes : 0
VIP up on local node : YES
Master Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159
Master Host Name : 10.222.164.81

Watchdog Node Information
Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159
Host Name : 10.222.164.81
Delegate IP : 10.222.167.250
Pgpool port : 5432
Watchdog port : 9000
Node priority : 2
Status : 4
Status Name : MASTER

Node Name : Not_Set
Host Name : 10.222.16.82
Delegate IP : Not_Set
Pgpool port : 5432
Watchdog port : 9000
Node priority : 0
Status : 0
Status Name : DEAD

[root@N1PPRL-UFGA0159 pgpool-II]# pcp_watchdog_info -h 10.222.164.81 -p 9898 -U pgpool -v
Password:
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM IS ON THE EDGE
Alive Remote Nodes : 0
VIP up on local node : YES
Master Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159
Master Host Name : 10.222.164.81

Watchdog Node Information
Node Name : 10.222.164.81:5432 Linux N1PPRL-UFGA0159
Host Name : 10.222.164.81
Delegate IP : 10.222.167.250
Pgpool port : 5432
Watchdog port : 9000
Node priority : 2
Status : 4
Status Name : MASTER

Node Name : Not_Set
Host Name : 10.222.16.82
Delegate IP : Not_Set
Pgpool port : 5432
Watchdog port : 9000
Node priority : 0
Status : 0
Status Name : DEAD
===

siva

2019-03-28 00:24

reporter   ~0002471

Hi Hoshiai,

Please find the all configuration and log file for reference...

Thanks for your help.

Pool Issues debug Info.zip (33,977 bytes)

hoshiai

2019-03-28 14:14

developer   ~0002474

Hi siva,

Thank you for sharing log file.

Probably, when communication between Pgpool-II and PostgreSQL failed, Pgpool-II degenerate backend PostgreSQL, because of fail_over_on_backend_error = on.
How about setting healthcheck enabled, instead of fail_over_on_backend = off .

And heartbeat also failed because of "No such device". Does the 'eth0' contain network about '10.222.16.xx' ?
I think heartbeat will probably be success, when 'heartbeat_device0' paramter change from 'eth0' to ''(empty caharacters).

hoshiai

2019-03-28 14:29

developer   ~0002476

> Q) Here finding is , we are not using any failover command to failover , but still showing failover operation in log , but in backend nothing was happend. Is any sort of promotion or degeneration happend at pool level with out effecting backend DB node?

A) Yes. this log are meant degeneration happend at Pgpool level. Actually, PostgreSQL not failover.

siva

2019-03-28 19:04

reporter   ~0002477

Hi hoshiai,

thanks for your help , will check and get back to you.

siva

2019-03-29 00:55

reporter   ~0002480

Hi hoshiai,

Thanks for your help.

And heartbeat also failed because of "No such device". Does the 'eth0' contain network about '10.222.16.xx' ?
I think heartbeat will probably be success, when 'heartbeat_device0' paramter change from 'eth0' to ''(empty caharacters).

Change: I have done change as per your suggestion.'heartbeat_device0='' . Now both Master and Slave pool servers are acting as "Masters".

As per the log both are sending heartbeat signals and not getting reply from both the servers so assuming as masters.

Below is log from slave pool:
LOG: watchdog node state changed from [DEAD] to [LOADING]
LOG: watchdog node state changed from [LOADING] to [JOINING]
LOG: watchdog node state changed from [JOINING] to [INITIALIZING
LOG: I am the only alive node in the watchdog cluster
HINT: skipping stand for coordinator state
LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
LOG: I am announcing my self as master/coordinator watchdog node

Below are the some of findings:

we are not able to find watchdog heartbeat port : 9694 and not able accept the connections on 9694 even both master and slave pool are up and running.

[root@s2n pgpool-II]# netstat -nlt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN

[root@INHUSZ1-V1625126 ~]# netstat -nlt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN

[root@s2n pgpool-II]# hostname -I
172.16.251.25
[root@s2n pgpool-II]# telnet 172.16.251.26 5432
Trying 172.16.251.26...
Connected to 172.16.251.26.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@s2n pgpool-II]# telnet 172.16.251.26 9000
Trying 172.16.251.26...
Connected to 172.16.251.26.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@s2n pgpool-II]# telnet 172.16.251.26 9898
Trying 172.16.251.26...
Connected to 172.16.251.26.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@s2n pgpool-II]# telnet 172.16.251.26 9694
Trying 172.16.251.26...
telnet: connect to address 172.16.251.26: Connection refused

On Slave:

[root@INHUSZ1-V1625126 ~]# telnet 172.16.251.25 9694
Trying 172.16.251.25...
telnet: connect to address 172.16.251.25: Connection refused
[root@INHUSZ1-V1625126 ~]# telnet 172.16.251.25 9898
Trying 172.16.251.25...
Connected to 172.16.251.25.
Escape character is '^]'.

is there any other config parameters to tweek to get sync between servers apart from

other_pgpool_hostname0
other_pgpool_port0
other_wd_port0
heartbeat_destination0
 
heartbeat_destination_port0 = 9694
 
heartbeat_device0 = ''

siva

2019-03-29 13:31

reporter   ~0002481

Hi hoshiai,

We have found some heartbeat sender and receiver port mismatch, is it expected or is their any issue with my setup. The heartbeat sender process is keep on changing after restarting the pool.

On Master:
[root@INHUSZ1-V1625126 ~]# netstat -nltup | grep pgpool
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 25579/pgpool: watch
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 25578/pgpool
tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN 25578/pgpool
tcp6 0 0 :::5433 :::* LISTEN 25578/pgpool
udp 0 0 0.0.0.0:9694 0.0.0.0:* 25587/pgpool: heart
udp 0 0 0.0.0.0:59432 0.0.0.0:* 25589/pgpool: heart

[root@INHUSZ1-V1625126 ~]# ps -eaf| grep 59432
root 31903 24284 0 13:38 pts/2 00:00:00 grep --color=auto 59432
[root@INHUSZ1-V1625126 ~]# ps -eaf| grep 25589
root 25589 25581 0 00:12 ? 00:00:01 pgpool: heartbeat sender
root 31920 24284 0 13:38 pts/2 00:00:00 grep --color=auto 25589
[root@INHUSZ1-V1625126 ~]# ps -eaf| grep 25581
root 25581 25578 0 00:12 ? 00:00:35 pgpool: lifecheck
root 25587 25581 0 00:12 ? 00:00:00 pgpool: heartbeat receiver
root 25589 25581 0 00:12 ? 00:00:01 pgpool: heartbeat sender
root 32018 25581 0 13:39 ? 00:00:00 ping -q -c3 172.16.251.26
root 32019 25581 0 13:39 ? 00:00:00 ping -q -c3 172.16.251.27
root 32022 24284 0 13:39 pts/2 00:00:00 grep --color=auto 25581

? pgpool.service - Pgpool-II
   Loaded: loaded (/usr/lib/systemd/system/pgpool.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2019-03-29 00:12:51 IST; 13h ago
  Process: 25468 ExecStop=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $STOP_OPTS stop (code=exited, status=0/SUCCESS)
 Main PID: 25578 (pgpool)
    Tasks: 42
   CGroup: /system.slice/pgpool.service
           +-25578 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n
           +-25579 pgpool: watchdog
           +-25581 pgpool: lifecheck
           +-25582 pgpool: wait for connection request
           +-25583 pgpool: wait for connection request
           +-25584 pgpool: wait for connection request
           +-25585 pgpool: wait for connection request
           +-25586 pgpool: wait for connection request
           +-25587 pgpool: heartbeat receiver
           +-25588 pgpool: wait for connection request
           +-25589 pgpool: heartbeat sender
           +-25590 pgpool: wait for connection request
           +-25591 pgpool: wait for connection request
           +-25592 pgpool: wait for connection request
           +-25593 pgpool: wait for connection request
           +-25594 pgpool: wait for connection request
           +-25595 pgpool: wait for connection request
           +-25596 pgpool: wait for connection request
           +-25597 pgpool: wait for connection request
           +-25598 pgpool: wait for connection request
           +-25599 pgpool: wait for connection request
           +-25600 pgpool: wait for connection request
           +-25601 pgpool: wait for connection request
           +-25602 pgpool: wait for connection request
           +-25603 pgpool: wait for connection request
           +-25604 pgpool: wait for connection request
           +-25605 pgpool: wait for connection request
           +-25606 pgpool: wait for connection request
           +-25607 pgpool: wait for connection request
           +-25608 pgpool: wait for connection request
           +-25609 pgpool: wait for connection request
           +-25610 pgpool: wait for connection request
           +-25611 pgpool: wait for connection request
           +-25612 pgpool: wait for connection request
           +-25613 pgpool: wait for connection request
           +-25614 pgpool: wait for connection request
           +-25615 pgpool: wait for connection request
           +-25617 pgpool: PCP: wait for connection request
           +-25618 pgpool: worker process
           +-25619 pgpool: health check process(0)
           +-25620 pgpool: health check process(1)
           
---

On Slave :

[root@s2n pgpool-II]# netstat -nltup | grep pgpool
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 4003/pgpool: watchd
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 4002/pgpool
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 4002/pgpool
tcp6 0 0 :::5432 :::* LISTEN 4002/pgpool
udp 0 0 0.0.0.0:9694 0.0.0.0:* 4013/pgpool: heartb
udp 0 0 0.0.0.0:36764 0.0.0.0:* 4015/pgpool: heartb
[root@s2n pgpool-II]# systemctl status pgpool.service
? pgpool.service - Pgpool-II
   Loaded: loaded (/usr/lib/systemd/system/pgpool.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2019-03-29 00:12:47 IST; 14h ago
  Process: 3861 ExecStop=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $STOP_OPTS stop (code=exited, status=0/SUCCESS)
 Main PID: 4002 (pgpool)
   CGroup: /system.slice/pgpool.service
           +-4002 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n
           +-4003 pgpool: watchdog
           +-4009 pgpool: lifecheck
           +-4010 pgpool: wait for connection request
           +-4011 pgpool: wait for connection request
           +-4012 pgpool: wait for connection request
           +-4013 pgpool: heartbeat receiver
           +-4014 pgpool: wait for connection request
           +-4015 pgpool: heartbeat sender
           +-4016 pgpool: wait for connection request

Could you please comment on this and is cause issue?

Thanks for your help.

hoshiai

2019-03-29 16:59

developer   ~0002482

> Change: I have done change as per your suggestion.'heartbeat_device0='' . Now both Master and Slave pool servers are acting as "Masters".
Thank you for trying. I understood that don't resolve a problem to change "heartbeat_device0".
I am interested to the result of netstat command apear heartbeat sender/receiver process.
Does it mean that haven't appear "No such device" messages?

> udp 0 0 0.0.0.0:9694 0.0.0.0:* 25587/pgpool: heartbeat receiver
> udp 0 0 0.0.0.0:59432 0.0.0.0:* 25589/pgpool: heartbeat sender
This is no problem.
The heartbeat sender process send a signal to other servers' 9694 port, so its socket does not need to use 9694 port on own local.

siva

2019-03-30 23:17

reporter   ~0002486

Hi hoshiai,

For missing sync between master and slave pgpool..

i have checked watchdog socket connectivity, not able to see established connection between master and slave pool: Is this cause missing connectivity between master and slave ? and even for tcpdump -i ens192 port 9000 , i am not able to get any response on master as well as slave?

On Slave:
[root@s2n pgpool-II]# netstat -nltupa | grep pgpool
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 22101/pgpool: watch
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 22100/pgpool
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 22100/pgpool
tcp6 0 0 :::5432 :::* LISTEN 22100/pgpool
udp 0 0 0.0.0.0:56384 0.0.0.0:* 22124/pgpool: heart
udp 0 0 0.0.0.0:9694 0.0.0.0:* 22123/pgpool: heart

On Master :
[root@INHUSZ1-V1625126 ~]# netstat -nltupa | grep pgpool
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 30772/pgpool: watch
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 30771/pgpool
tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN 30771/pgpool
tcp6 0 0 :::5433 :::* LISTEN 30771/pgpool
udp 0 0 0.0.0.0:9694 0.0.0.0:* 30795/pgpool: heart
udp 0 0 0.0.0.0:40139 0.0.0.0:* 30796/pgpool: heart

Is above issue is causing EVENT = TIMEOUT while initializing watchdog on slave pool instated of getting " outbound connection to master pool server "? or is here any issue for causing " EVENT =TIMEOUT" instead of outbound connection to master server?
What are the possible causes for getting this issue?

2019-03-28 18:55:52: pid 24816: DEBUG: STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = LOADING
2019-03-28 18:55:57: pid 24816: DEBUG: STATE MACHINE INVOKED WITH EVENT = TIMEOUT Current State = LOADING

Please share your comments on this..
Thanks a lot for your ongoing support :)

siva

2019-04-02 14:19

reporter   ~0002500

Hi Hoshiai,

Good Morning,

Please share your suggestion on above issue.

Thank you.

hoshiai

2019-04-03 17:29

developer   ~0002510

Hi siva,
Sorry for late reply. Currently,I am checking a sourcecode and a Pgpool's operation.

Could you confirm to conmmunicate beetween servers?
In particular, test that Slave Pgpool server communicates to parimary Pgpool server
using heartbeat ports(other_wd_port0, heartbeat_destination_port0). Using such as 'nmap' command.

If you can use 'nmap' command, you can check with the following command:

nmap {primary_server_address} -sUT -p 'T:{other_wd_port0},{hearbeat_destination_port0}, U:{other_wd_port0},{hearbeat_destination_port0}'

siva

2019-04-03 18:54

reporter   ~0002511

Thanks hoshiai,

i will check and share the output.

siva

2019-04-04 19:21

reporter   ~0002521

Hi hoshiai,

Thanks for your "nmap {primary_server_address} -sUT -p 'T:{other_wd_port0},{hearbeat_destination_port0}, U:{other_wd_port0},{hearbeat_destination_port0}'".
Port has not enabled , now we can able to identify the issue and now both master and slave are in sync .
Issue has been resolved .
Thanks a lot for your support, much appreciate for your support.

Please close this issue now.

hoshiai

2019-04-05 08:58

developer   ~0002523

It is glad that your problem resolved.

Ok, I colsed this issue.

Issue History

Date Modified Username Field Change
2019-03-20 14:31 siva New Issue
2019-03-20 14:40 pengbo Assigned To => hoshiai
2019-03-20 14:40 pengbo Status new => assigned
2019-03-22 16:17 hoshiai Note Added: 0002450
2019-03-24 13:29 siva File Added: Master_Pool_config_file.txt
2019-03-24 13:29 siva Note Added: 0002451
2019-03-25 15:15 hoshiai Note Added: 0002453
2019-03-25 15:55 siva Note Added: 0002454
2019-03-25 16:56 hoshiai Note Added: 0002455
2019-03-25 23:51 siva Note Added: 0002459
2019-03-27 23:44 siva Note Added: 0002469
2019-03-28 00:24 siva File Added: Pool Issues debug Info.zip
2019-03-28 00:24 siva Note Added: 0002471
2019-03-28 14:14 hoshiai Note Added: 0002474
2019-03-28 14:29 hoshiai Note Added: 0002476
2019-03-28 19:04 siva Note Added: 0002477
2019-03-29 00:55 siva Note Added: 0002480
2019-03-29 13:31 siva Note Added: 0002481
2019-03-29 16:59 hoshiai Note Added: 0002482
2019-03-30 23:17 siva Note Added: 0002486
2019-04-02 14:19 siva Note Added: 0002500
2019-04-03 17:29 hoshiai Note Added: 0002510
2019-04-03 18:54 siva Note Added: 0002511
2019-04-04 19:21 siva Note Added: 0002521
2019-04-05 08:58 hoshiai Status assigned => closed
2019-04-05 08:58 hoshiai Resolution open => no change required
2019-04-05 08:58 hoshiai Note Added: 0002523