[pgpool-general: 3355] Fwd: Pgpool Online Recovery behavior

antonio bagarolo antonio.bagarolo at gmail.com
Sat Dec 6 00:51:29 JST 2014


Hi all,
I'm playing with PgPool II for academic reason, and i've setup a PgPool II
configuration with Streaming Replication, Online Recovery and Watchdog.
Here is my conf:

Server 1 (Master) and Server2 (Slave): PgpoolII 3.3.4 with Postgresql 9.2.9
With Master and Slave server i mean Master backend(postgres) and standby
backend(postgres)

As user manual said, when I take down Server 1, failover procedure promote
Server 2 as new master.
However if I bring up Server 1 again, PgPool doesn't recognize it and let
its status to 3 (checked with pcp_node_info). For what i can get from user
manual, this is normal to PgPool II, beacuse I have to manually do the
pcp_recovery_node on fall node.
If i run it, the old master become a new slave and PgPool start to see it
correctly as 1.

My question is: there is a method in pgpool II to automatic go with online
recovery as soon as the fallen node go up again?

Because it seems that with the health check PgPool i do it already, but
only when in state 1.

Another scenario that "suffer" this problem is when Slave server go down
and then go up again, pgpool see it in state 3

Because i'm a newbe on PgPool, i would to be sure that i'm not missing
something in pgpool.conf that already do that;


Another behavior that I have is that, if i restart the pgpool istance, it
remeber the node status


Here is how my pgpool.conf (Server1) looks like:

-----------------------------------------------------------------------

# ----------------------------
# pgPool-II configuration file
# ----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# Whitespace may be used.  Comments are introduced with "#" anywhere on a
line.
# The complete list of parameter names and allowed values can be found in
the
# pgPool-II documentation.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pgpool reload".  Some
# parameters, which are marked below, require a server shutdown and restart
to
# take effect.
#


#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------

# - pgpool Connection Settings -

listen_addresses = '*'
                                   # Host name or IP address to listen on:
                                   # '*' for all, '' for no TCP/IP
connections
                                   # (change requires restart)
port = 9999
                                   # Port number
                                   # (change requires restart)
socket_dir = '/tmp'
                                   # Unix domain socket path
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)


# - pgpool Communication Manager Connection Settings -

pcp_port = 9898
                                   # Port number for pcp
                                   # (change requires restart)
pcp_socket_dir = '/tmp'
                                   # Unix domain socket path for pcp
                                   # The Debian package defaults to
                                   # /var/run/postgresql
                                   # (change requires restart)

# - Backend Connection Settings -

backend_hostname0 = 'pgpool-1'
                                   # Host name or IP address to connect to
for backend 0
backend_port0 = 5432
                                   # Port number for backend 0
backend_weight0 = 1
                                   # Weight for backend 0 (only in load
balancing mode)
backend_data_directory0 = '/usr/local/pgsql/data'
                                   # Data directory for backend 0
backend_flag0 = 'ALLOW_TO_FAILOVER'
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER or
DISALLOW_TO_FAILOVER
backend_hostname1 = 'pgpool-2'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/usr/local/pgsql/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -

enable_pool_hba = off
                                   # Use pool_hba.conf for client
authentication
pool_passwd = 'pool_passwd'
                                   # File name of pool_passwd for md5
authentication.
                                   # "" disables pool_passwd.
                                   # (change requires restart)
authentication_timeout = 60
                                   # Delay in seconds to complete client
authentication
                                   # 0 means no timeout.

# - SSL Connections -

ssl = off
                                   # Enable SSL support
                                   # (change requires restart)
#ssl_key = './server.key'
                                   # Path to the SSL private key file
                                   # (change requires restart)
#ssl_cert = './server.cert'
                                   # Path to the SSL public certificate file
                                   # (change requires restart)
#ssl_ca_cert = ''
                                   # Path to a single PEM format file
                                   # containing CA root certificate(s)
                                   # (change requires restart)
#ssl_ca_cert_dir = ''
                                   # Directory containing CA root
certificate(s)
                                   # (change requires restart)


#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------

# - Pool size -

num_init_children = 32
                                   # Number of pools
                                   # (change requires restart)
max_pool = 4
                                   # Number of connections per pool
                                   # (change requires restart)

# - Life time -

child_life_time = 300
                                   # Pool exits after being idle for this
many seconds
child_max_connections = 0
                                   # Pool exits after receiving that many
connections
                                   # 0 means no exit
connection_life_time = 0
                                   # Connection to backend closes after
being idle for this many seconds
                                   # 0 means no close
client_idle_limit = 0
                                   # Client is disconnected after being
idle for that many seconds
                                   # (even inside an explicit transactions!)
                                   # 0 means no disconnection


#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------

# - Where to log -

log_destination = 'stderr'
                                   # Where to log
                                   # Valid values are combinations of
stderr,
                                   # and syslog. Default to stderr.

# - What to log -

print_timestamp = on
                                   # Print timestamp on each line
                                   # (change requires restart)

log_connections = off
                                   # Log connections
log_hostname = off
                                   # Hostname will be shown in ps status
                                   # and in logs if connections are logged
log_statement = off
                                   # Log all statements
log_per_node_statement = off
                                   # Log all statements
                                   # with node and backend informations
log_standby_delay = 'none'
                                   # Log standby delay
                                   # Valid values are combinations of
always,
                                   # if_over_threshold, none

# - Syslog specific -

syslog_facility = 'LOCAL0'
                                   # Syslog local facility. Default to
LOCAL0
syslog_ident = 'pgpool'
                                   # Syslog program identification string
                                   # Default to 'pgpool'

# - Debug -

debug_level = 0
                                   # Debug message verbosity level
                                   # 0 means no message, 1 or more mean
verbose


#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------

pid_file_name = '/var/run/pgpool/pgpool.pid'
                                   # PID file name
                                   # (change requires restart)
logdir = '/tmp'
                                   # Directory of pgPool status file
                                   # (change requires restart)


#------------------------------------------------------------------------------
# CONNECTION POOLING
#------------------------------------------------------------------------------

connection_cache = on
                                   # Activate connection pools
                                   # (change requires restart)

                                   # Semicolon separated list of queries
                                   # to be issued at the end of a session
                                   # The default is for 8.3 and later
reset_query_list = 'ABORT; DISCARD ALL'
                                   # The following one is for 8.2 and before
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'


#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------

replication_mode = off
                                   # Activate replication mode
                                   # (change requires restart)
replicate_select = off
                                   # Replicate SELECT statements
                                   # when in replication or parallel mode
                                   # replicate_select is higher priority
than
                                   # load_balance_mode.

insert_lock = off
                                   # Automatically locks a dummy row or a
table
                                   # with INSERT statements to keep SERIAL
data
                                   # consistency
                                   # Without SERIAL, no lock will be issued
lobj_lock_table = ''
                                   # When rewriting lo_creat command in
                                   # replication mode, specify table name to
                                   # lock

# - Degenerate handling -

replication_stop_on_mismatch = off
                                   # On disagreement with the packet kind
                                   # sent from backend, degenerate the node
                                   # which is most likely "minority"
                                   # If off, just force to exit this session

failover_if_affected_tuples_mismatch = off
                                   # On disagreement with the number of
affected
                                   # tuples in UPDATE/DELETE queries, then
                                   # degenerate the node which is most
likely
                                   # "minority".
                                   # If off, just abort the transaction to
                                   # keep the consistency


#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------

load_balance_mode = on
                                   # Activate load balancing mode
                                   # (change requires restart)
ignore_leading_white_space = on
                                   # Ignore leading white spaces of each
query
white_function_list = ''
                                   # Comma separated list of function names
                                   # that don't write to database
                                   # Regexp are accepted
black_function_list = 'currval,lastval,nextval,setval'
                                   # Comma separated list of function names
                                   # that write to database
                                   # Regexp are accepted


#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------

master_slave_mode = on
                                   # Activate master/slave mode
                                   # (change requires restart)
master_slave_sub_mode = 'stream'
                                   # Master/slave sub mode
                                   # Valid values are combinations slony or
                                   # stream. Default is slony.
                                   # (change requires restart)

# - Streaming -

sr_check_period = 0
                                   # Streaming replication check period
                                   # Disabled (0) by default
sr_check_user = 'replication'
                                   # Streaming replication check user
                                   # This is neccessary even if you disable
streaming
                                   # replication delay check by
sr_check_period = 0
sr_check_password = 'replication'
                                   # Password for streaming replication
check user
delay_threshold = 0
                                   # Threshold before not dispatching query
to standby node
                                   # Unit is in bytes
                                   # Disabled (0) by default

# - Special commands -

follow_master_command = ''
                                   # Executes this command after master
failover
                                   # Special values:
                                   #   %d = node id
                                   #   %h = host name
                                   #   %p = port number
                                   #   %D = database cluster path
                                   #   %m = new master node id
                                   #   %H = hostname of the new master node
                                   #   %M = old master node id
                                   #   %P = old primary node id
   #   %r = new master port number
   #   %R = new master database cluster path
                                   #   %% = '%' character


#------------------------------------------------------------------------------
# PARALLEL MODE
#------------------------------------------------------------------------------

parallel_mode = off
                                   # Activates parallel query mode
                                   # (change requires restart)
pgpool2_hostname = 'pgpool-1'
                                   # Set pgpool2 hostname
                                   # (change requires restart)

# - System DB info -

system_db_hostname  = 'localhost'
                                   # (change requires restart)
system_db_port = 5432
                                   # (change requires restart)
system_db_dbname = 'pgpool'
                                   # (change requires restart)
system_db_schema = 'pgpool_catalog'
                                   # (change requires restart)
system_db_user = 'postgres'
                                   # (change requires restart)
#system_db_password = 'postgres'
                                   # (change requires restart)


#------------------------------------------------------------------------------
# HEALTH CHECK
#------------------------------------------------------------------------------

health_check_period = 10
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 0
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'postgres'
                                   # Health check user
health_check_password = 'postgres'
                                   # Password for health check user
health_check_max_retries = 0
                                   # Maximum number of times to retry a
failed health check before giving up.
health_check_retry_delay = 1
                                   # Amount of time to wait (in seconds)
between retries.


#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/usr/local/pgsql/data/failover.sh %d %P %H %R %h'
#failover_command = 'echo "FAILOVER COMMAND"'
                                   # Executes this command at failover
                                   # Special values:
                                   #   %d = node id
                                   #   %h = host name
                                   #   %p = port number
                                   #   %D = database cluster path
                                   #   %m = new master node id
                                   #   %H = hostname of the new master node
                                   #   %M = old master node id
                                   #   %P = old primary node id
   #   %r = new master port number
   #   %R = new master database cluster path

                                   #   %% = '%' character
failback_command = 'echo "FAILBACK COMMAND"'
                                   # Executes this command at failback.
                                   # Special values:
                                   #   %d = node id
                                   #   %h = host name
                                   #   %p = port number
                                   #   %D = database cluster path
                                   #   %m = new master node id
                                   #   %H = hostname of the new master node
                                   #   %M = old master node id
                                   #   %P = old primary node id
   #   %r = new master port number
   #   %R = new master database cluster path
                                   #   %% = '%' character

fail_over_on_backend_error = on
                                   # Initiates failover when
reading/writing to the
                                   # backend communication socket fails
                                   # If set to off, pgpool will report an
                                   # error and disconnect the session.

search_primary_node_timeout = 10
                                   # Timeout in seconds to search for the
                                   # primary node when a failover occurs.
                                   # 0 means no timeout, keep searching
                                   # for a primary node forever.

#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------

recovery_user = 'postgres'
                                   # Online recovery user
recovery_password = 'postgres'
                                   # Online recovery password
recovery_1st_stage_command = 'recovery_1_stage'
                                   # Executes a command in first stage
recovery_2nd_stage_command = ''
                                   # Executes a command in second stage
recovery_timeout = 90
                                   # Timeout in seconds to wait for the
                                   # recovering node's postmaster to start
up
                                   # 0 means no wait
client_idle_limit_in_recovery = 0
                                   # Client is disconnected after being idle
                                   # for that many seconds in the second
stage
                                   # of online recovery
                                   # 0 means no disconnection
                                   # -1 means immediate disconnection


#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -

use_watchdog = on
                                    # Activates watchdog
                                    # (change requires restart)

# -Connection to up stream servers -

trusted_servers = ''
                                    # trusted server list which are used
                                    # to confirm network connection
                                    # (hostA,hostB,hostC,...)
                                    # (change requires restart)
ping_path = '/bin'
                                    # ping command path
                                    # (change requires restart)

# - Watchdog communication Settings -

wd_hostname = 'pgpool-1'
                                    # Host name or IP address of this
watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
wd_authkey = ''
                                    # Authentication key for watchdog
communication
                                    # (change requires restart)

# - Virtual IP control Setting -

delegate_IP = '192.168.100.251'
                                    # delegate IP address
                                    # If this is empty, virtual IP never
bring up.
                                    # (change requires restart)
ifconfig_path = '/sbin'
                                    # ifconfig command path
                                    # (change requires restart)
if_up_cmd = 'ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.0'
                                    # startup delegate IP command
                                    # (change requires restart)
if_down_cmd = 'ifconfig eth0:0 down'
                                    # shutdown delegate IP command
                                    # (change requires restart)

arping_path = '/usr/sbin'           # arping command path
                                    # (change requires restart)

arping_cmd = 'arping -U $_IP_$ -w 1'
                                    # arping command
                                    # (change requires restart)

# - Behaivor on escalation Setting -

clear_memqcache_on_escalation = on
                                    # Clear all the query cache on shared
memory
                                    # when standby pgpool escalate to
active pgpool
                                    # (= virtual IP holder).
                                    # This should be off if client connects
to pgpool
                                    # not using virtual IP.
                                    # (change requires restart)
wd_escalation_command = ''
                                    # Executes this command at escalation
on new active pgpool.
                                    # (change requires restart)

# - Lifecheck Setting -

# -- common --

wd_lifecheck_method = 'heartbeat'
                                    # Method of watchdog lifecheck
('heartbeat' or 'query')
                                    # (change requires restart)
wd_interval = 10
                                    # lifecheck interval (sec) > 0
                                    # (change requires restart)

# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat
signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat
signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat
signal (sec)
                                    # (change requires restart)
heartbeat_destination0 = 'pgpool-2'
                                    # Host name or IP address of
destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_destination_port0 = 9694
                                    # Port number of destination 0 for
sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
heartbeat_device0 = ''
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)

#heartbeat_destination1 = 'host0_ip2'
#heartbeat_destination_port1 = 9694
#heartbeat_device1 = ''

# -- query mode --

wd_life_point = 3
                                    # lifecheck retry times
                                    # (change requires restart)
wd_lifecheck_query = 'SELECT 1'
                                    # lifecheck query to pgpool from
watchdog
                                    # (change requires restart)
wd_lifecheck_dbname = 'template1'
                                    # Database name connected for lifecheck
                                    # (change requires restart)
wd_lifecheck_user = 'nobody'
                                    # watchdog user monitoring pgpools in
lifecheck
                                    # (change requires restart)
wd_lifecheck_password = ''
                                    # Password for watchdog user in
lifecheck
                                    # (change requires restart)

# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'pgpool-2'
                                    # Host name or IP address to connect to
for other pgpool 0
                                    # (change requires restart)
other_pgpool_port0 = 9999
                                    # Port number for othet pgpool 0
                                    # (change requires restart)
other_wd_port0 = 9000
                                    # Port number for othet watchdog 0
                                    # (change requires restart)
#other_pgpool_hostname1 = 'host1'
#other_pgpool_port1 = 5432
#other_wd_port1 = 9000


#------------------------------------------------------------------------------
# OTHERS
#------------------------------------------------------------------------------
relcache_expire = 0
                                   # Life time of relation cache in seconds.
                                   # 0 means no cache expiration(the
default).
                                   # The relation cache is used for cache
the
                                   # query result against PostgreSQL system
                                   # catalog to obtain various information
                                   # including table structures or if it's a
                                   # temporary table or not. The cache is
                                   # maintained in a pgpool child local
memory
                                   # and being kept as long as it survives.
                                   # If someone modify the table by using
                                   # ALTER TABLE or some such, the relcache
is
                                   # not consistent anymore.
                                   # For this purpose, cache_expiration
                                   # controls the life time of the cache.
relcache_size = 256
                                   # Number of relation cache
                                   # entry. If you see frequently:
   # "pool_search_relcache: cache replacement happend"
   # in the pgpool log, you might want to increate this number.

check_temp_table = on
                                   # If on, enable temporary table check in
SELECT statements.
                                   # This initiates queries against system
catalog of primary/master
   # thus increases load of master.
   # If you are absolutely sure that your system never uses temporary tables
   # and you want to save access to primary/master, you could turn this off.
   # Default is on.


#------------------------------------------------------------------------------
# ON MEMORY QUERY MEMORY CACHE
#------------------------------------------------------------------------------
memory_cache_enabled = off
   # If on, use the memory cache functionality, off by default
memqcache_method = 'shmem'
   # Cache storage method. either 'shmem'(shared memory) or
   # 'memcached'. 'shmem' by default
                                   # (change requires restart)
memqcache_memcached_host = 'localhost'
   # Memcached host name or IP address. Mandatory if
   # memqcache_method = 'memcached'.
   # Defaults to localhost.
                                   # (change requires restart)
memqcache_memcached_port = 11211
   # Memcached port number. Mondatory if memqcache_method = 'memcached'.
   # Defaults to 11211.
                                   # (change requires restart)
memqcache_total_size = 67108864
   # Total memory size in bytes for storing memory cache.
   # Mandatory if memqcache_method = 'shmem'.
   # Defaults to 64MB.
                                   # (change requires restart)
memqcache_max_num_cache = 1000000
   # Total number of cache entries. Mandatory
   # if memqcache_method = 'shmem'.
   # Each cache entry consumes 48 bytes on shared memory.
   # Defaults to 1,000,000(45.8MB).
                                   # (change requires restart)
memqcache_expire = 0
   # Memory cache entry life time specified in seconds.
   # 0 means infinite life time. 0 by default.
                                   # (change requires restart)
memqcache_auto_cache_invalidation = on
   # If on, invalidation of query cache is triggered by corresponding
   # DDL/DML/DCL(and memqcache_expire).  If off, it is only triggered
   # by memqcache_expire.  on by default.
                                   # (change requires restart)
memqcache_maxcache = 409600
   # Maximum SELECT result size in bytes.
   # Must be smaller than memqcache_cache_block_size. Defaults to 400KB.
                                   # (change requires restart)
memqcache_cache_block_size = 1048576
   # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.
   # Defaults to 1MB.
                                   # (change requires restart)
memqcache_oiddir = '/var/log/pgpool/oiddir'
      # Temporary work directory to record table oids
                                   # (change requires restart)
white_memqcache_table_list = ''
                                   # Comma separated list of table names to
memcache
                                   # that don't write to database
                                   # Regexp are accepted
black_memqcache_table_list = ''
                                   # Comma separated list of table names
not to memcache
                                   # that don't write to database
                                   # Regexp are accepted

--------------------------------------------------------



Thank you in advance (and sorry for my English, i hope i was clear enough)

-- 

================================
 Antonio Bagarolo,
 Software Engineer and Research Assistant -  mOSAIC project FP7
 Department of Industrial and Information Engineering
 Second University of Naples
 Real Casa dell'Annunziata via Roma, 29
 81031 Aversa (CE) - ITALY
 Skype: antonio.bagarolo
 Cell: +39 328 6222197
 ================================



Stampa questa e-mail solo se necessario. Rispetta l'ambiente, non sprecare
la carta.

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003 Le informazioni contenute in questo
messaggio di posta elettronica e/o nel/i file/s allegato/i sono da
considerarsi strettamente riservate. Il loro utilizzo è consentito
esclusivamente al destinatario del messaggio, per le finalità indicate nel
messaggio stesso. Qualora riceviate questo messaggio senza esserne il
destinatario, Vi preghierei cortesemente di darmene notizia via e-mail e di
procedere alla distruzione del  messaggio stesso, cancellandolo dal Vostro
sistema. Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri  soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce  comportamento contrario ai principi  dettati dal
D.Lgs. 196/2003.



Print this e-mail only if needed. Respect the environment,do not waste
paper.

WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain information
which is privileged, confidential, proprietary, or exempt from disclosure
under applicable law. If you are not the intended recipient or the person
responsible for delivering the message to the intended recipient, you are
strictly prohibited from disclosing, distributing, copying, or in any way
using this message. If you have received this communication in error,
please notify the sender and destroy and delete any copies you may have
received.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20141205/5a00ec08/attachment-0001.html>


More information about the pgpool-general mailing list