[pgpool-general: 2253] Re: pgpool 3.3.1 replicate setup with 3 nodes
Videanu Adrian
videanuadrian at yahoo.com
Thu Nov 7 20:15:44 JST 2013
Hi,
Here are my scripts:
(
There must be something related to this line
psql -p $port -c "SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = 'S'" $i
)
basebackup.sh
#!/bin/bash -x
DATA=$1
RECOVERY_TARGET=$2
RECOVERY_DATA=$3
psql -c "select pg_start_backup('pgpool-recovery')" postgres
echo "restore_command = 'scp -P 2022 $HOSTNAME:/var/lib/postgresql/9.2/archive/%f %p'" > /var/lib/postgresql/9.2/data/recovery.conf
tar -C /var/lib/postgresql/9.2/data -zcf pgsql.tar.gz . --exclude='postmaster.log' --exclude='postmaster.pid' --exclude='postmaster.opts'
psql -c 'select pg_stop_backup()' postgres
scp -P 2022 pgsql.tar.gz $RECOVERY_TARGET:$RECOVERY_DATA
pgpool_recovery_pitr.sh
#!/bin/bash
# Online recovery 2nd stage script
#
datadir=$1 # master dabatase cluster
DEST=$2 # hostname of the DB node to be recovered
DESTDIR=$3 # database cluster of the DB node to be recovered
port=5432 # PostgreSQL port number
archdir=/var/lib/postgresql/9.2/archive # archive log directory
# Force to flush current value of sequences to xlog
psql -p $port -t -c 'SELECT datname FROM pg_database WHERE NOT datistemplate AND datallowconn' template1|
while read i
do
if [ "$i" != "" ];then
psql -p $port -c "SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = 'S'" $i
fi
done
psql -p $port -c "SELECT pgpool_switch_xlog('$archdir')" template1
pgpool_remote_start
#!/bin/bash -x
#
# Start PostgreSQL on the recovery target node
#
DEST=$1
DESTDIR=$2
# Deploy a base backup
ssh -T -p 2022 root@$DEST 'cd /var/lib/postgresql/9.2/data; tar zxf pgsql.tar.gz'
ssh -T -p 2022 root@$DEST '/etc/init.d/postgresql-9.2 start'
rm -rf pgsql.tar.gz
rm -rf recovery.conf
These are my scripts used in recovery process. I will put here the config file also:
# ----------------------------
# pgPool-II configuration file
# ----------------------------
#
# This file consists of lines of the form:
#
# name = value
#
# Whitespace may be used. Comments are introduced with "#" anywhere on a line.
# The complete list of parameter names and allowed values can be found in the
# pgPool-II documentation.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal. If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pgpool reload". Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#
#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------
# - pgpool Connection Settings -
listen_addresses = '*'
# Host name or IP address to listen on:
# '*' for all, '' for no TCP/IP connections
# (change requires restart)
port = 9999
# Port number
# (change requires restart)
socket_dir = '/tmp'
# Unix domain socket path
# The Debian package defaults to
# /var/run/postgresql
# (change requires restart)
# - pgpool Communication Manager Connection Settings -
pcp_port = 9898
# Port number for pcp
# (change requires restart)
pcp_socket_dir = '/tmp'
# Unix domain socket path for pcp
# The Debian package defaults to
# /var/run/postgresql
# (change requires restart)
# - Backend Connection Settings -
backend_hostname0 = '192.168.91.33'
# Host name or IP address to connect to for backend 0
backend_port0 = 5432
# Port number for backend 0
backend_weight0 = 1
# Weight for backend 0 (only in load balancing mode)
backend_data_directory0 = '/var/lib/postgresql/9.2/data'
# Data directory for backend 0
backend_flag0 = 'ALLOW_TO_FAILOVER'
# Controls various backend behavior
# ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER
backend_hostname1 = '192.168.91.34'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.2/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = '192.168.91.35'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/lib/postgresql/9.2/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
# - Authentication -
enable_pool_hba = on
# Use pool_hba.conf for client authentication
pool_passwd = ''
# File name of pool_passwd for md5 authentication.
# "" disables pool_passwd.
# (change requires restart)
authentication_timeout = 60
# Delay in seconds to complete client authentication
# 0 means no timeout.
# - SSL Connections -
ssl = off
# Enable SSL support
# (change requires restart)
#ssl_key = './server.key'
# Path to the SSL private key file
# (change requires restart)
#ssl_cert = './server.cert'
# Path to the SSL public certificate file
# (change requires restart)
#ssl_ca_cert = ''
# Path to a single PEM format file
# containing CA root certificate(s)
# (change requires restart)
#ssl_ca_cert_dir = ''
# Directory containing CA root certificate(s)
# (change requires restart)
#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------
# - Pool size -
num_init_children = 100
# Number of pools
# (change requires restart)
max_pool = 10
# Number of connections per pool
# (change requires restart)
# - Life time -
child_life_time = 300
# Pool exits after being idle for this many seconds
child_max_connections = 0
# Pool exits after receiving that many connections
# 0 means no exit
connection_life_time = 0
# Connection to backend closes after being idle for this many seconds
# 0 means no close
client_idle_limit = 0
# Client is disconnected after being idle for that many seconds
# (even inside an explicit transactions!)
# 0 means no disconnection
#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------
# - Where to log -
log_destination = 'syslog'
# Where to log
# Valid values are combinations of stderr,
# and syslog. Default to stderr.
# - What to log -
print_timestamp = on
# Print timestamp on each line
# (change requires restart)
log_connections = on
# Log connections
log_hostname = on
# Hostname will be shown in ps status
# and in logs if connections are logged
log_statement = off
# Log all statements
log_per_node_statement = on
# Log all statements
# with node and backend informations
log_standby_delay = 'none'
# Log standby delay
# Valid values are combinations of always,
# if_over_threshold, none
# - Syslog specific -
syslog_facility = 'LOCAL0'
# Syslog local facility. Default to LOCAL0
syslog_ident = 'pgpool'
# Syslog program identification string
# Default to 'pgpool'
# - Debug -
debug_level = 0
# Debug message verbosity level
# 0 means no message, 1 or more mean verbose
#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
pid_file_name = '/var/run/pgpool.pid'
# PID file name
# (change requires restart)
logdir = '/tmp'
# Directory of pgPool status file
# (change requires restart)
#------------------------------------------------------------------------------
# CONNECTION POOLING
#------------------------------------------------------------------------------
connection_cache = on
# Activate connection pools
# (change requires restart)
# Semicolon separated list of queries
# to be issued at the end of a session
# The default is for 8.3 and later
reset_query_list = 'ABORT; DISCARD ALL'
# The following one is for 8.2 and before
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------
replication_mode = on
# Activate replication mode
# (change requires restart)
replicate_select = off
# Replicate SELECT statements
# when in replication or parallel mode
# replicate_select is higher priority than
# load_balance_mode.
insert_lock = on
# Automatically locks a dummy row or a table
# with INSERT statements to keep SERIAL data
# consistency
# Without SERIAL, no lock will be issued
lobj_lock_table = ''
# When rewriting lo_creat command in
# replication mode, specify table name to
# lock
# - Degenerate handling -
replication_stop_on_mismatch = off
# On disagreement with the packet kind
# sent from backend, degenerate the node
# which is most likely "minority"
# If off, just force to exit this session
failover_if_affected_tuples_mismatch = off
# On disagreement with the number of affected
# tuples in UPDATE/DELETE queries, then
# degenerate the node which is most likely
# "minority".
# If off, just abort the transaction to
# keep the consistency
#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------
load_balance_mode = on
# Activate load balancing mode
# (change requires restart)
ignore_leading_white_space = on
# Ignore leading white spaces of each query
white_function_list = ''
# Comma separated list of function names
# that don't write to database
# Regexp are accepted
black_function_list = 'nextval,setval'
# Comma separated list of function names
# that write to database
# Regexp are accepted
#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------
master_slave_mode = off
# Activate master/slave mode
# (change requires restart)
master_slave_sub_mode = 'slony'
# Master/slave sub mode
# Valid values are combinations slony or
# stream. Default is slony.
# (change requires restart)
# - Streaming -
sr_check_period = 0
# Streaming replication check period
# Disabled (0) by default
sr_check_user = 'postgres'
# Streaming replication check user
# This is necessary even if you disable
# streaming replication delay check with
# sr_check_period = 0
sr_check_password = ''
# Password for streaming replication check user
delay_threshold = 0
# Threshold before not dispatching query to standby node
# Unit is in bytes
# Disabled (0) by default
# - Special commands -
follow_master_command = ''
# Executes this command after master failover
# Special values:
# %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %r = new master port number
# %R = new master database cluster path
# %% = '%' character
#------------------------------------------------------------------------------
# PARALLEL MODE
#------------------------------------------------------------------------------
parallel_mode = off
# Activates parallel query mode
# (change requires restart)
pgpool2_hostname = ''
# Set pgpool2 hostname
# (change requires restart)
# - System DB info -
system_db_hostname = 'localhost'
# (change requires restart)
system_db_port = 5432
# (change requires restart)
system_db_dbname = 'pgpool'
# (change requires restart)
system_db_schema = 'pgpool_catalog'
# (change requires restart)
system_db_user = 'pgpool'
# (change requires restart)
system_db_password = ''
# (change requires restart)
#------------------------------------------------------------------------------
# HEALTH CHECK
#------------------------------------------------------------------------------
health_check_period = 2
# Health check period
# Disabled (0) by default
health_check_timeout = 20
# Health check timeout
# 0 means no timeout
health_check_user = 'postgres'
# Health check user
health_check_password = ''
# Password for health check user
health_check_max_retries = 0
# Maximum number of times to retry a failed health check before giving up.
health_check_retry_delay = 1
# Amount of time to wait (in seconds) between retries.
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
failover_command = ''
# Executes this command at failover
# Special values:
# %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %r = new master port number
# %R = new master database cluster path
# %% = '%' character
failback_command = ''
# Executes this command at failback.
# Special values:
# %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %r = new master port number
# %R = new master database cluster path
# %% = '%' character
fail_over_on_backend_error = on
# Initiates failover when reading/writing to the
# backend communication socket fails
# If set to off, pgpool will report an
# error and disconnect the session.
search_primary_node_timeout = 10
# Timeout in seconds to search for the
# primary node when a failover occurs.
# 0 means no timeout, keep searching
# for a primary node forever.
#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------
recovery_user = 'postgres'
# Online recovery user
recovery_password = ''
# Online recovery password
recovery_1st_stage_command = 'basebackup.sh'
# Executes a command in first stage
recovery_2nd_stage_command = 'pgpool_recovery_pitr.sh'
# Executes a command in second stage
recovery_timeout = 900
# Timeout in seconds to wait for the
# recovering node's postmaster to start up
# 0 means no wait
client_idle_limit_in_recovery = 0
# Client is disconnected after being idle
# for that many seconds in the second stage
# of online recovery
# 0 means no disconnection
# -1 means immediate disconnection
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
# - Enabling -
use_watchdog = on
# Activates watchdog
# (change requires restart)
# -Connection to up stream servers -
trusted_servers = ''
# trusted server list which are used
# to confirm network connection
# (hostA,hostB,hostC,...)
# (change requires restart)
ping_path = '/bin'
# ping command path
# (change requires restart)
# - Watchdog communication Settings -
wd_hostname = '192.168.91.31'
# Host name or IP address of this watchdog
# (change requires restart)
wd_port = 9000
# port number for watchdog service
# (change requires restart)
wd_authkey = ''
# Authentication key for watchdog communication
# (change requires restart)
# - Virtual IP control Setting -
delegate_IP = '192.168.91.30'
# delegate IP address
# If this is empty, virtual IP never bring up.
# (change requires restart)
ifconfig_path = '/bin'
# ifconfig command path
# (change requires restart)
if_up_cmd = 'ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.0'
# startup delegate IP command
# (change requires restart)
if_down_cmd = 'ifconfig eth0:0 down'
# shutdown delegate IP command
# (change requires restart)
arping_path = '/sbin' # arping command path
# (change requires restart)
arping_cmd = 'arping -U $_IP_$ -w 1'
# arping command
# (change requires restart)
# - Behaivor on escalation Setting -
clear_memqcache_on_escalation = on
# Clear all the query cache on shared memory
# when standby pgpool escalate to active pgpool
# (= virtual IP holder).
# This should be off if client connects to pgpool
# not using virtual IP.
# (change requires restart)
wd_escalation_command = ''
# Executes this command at escalation on new active pgpool.
# (change requires restart)
# - Lifecheck Setting -
# -- common --
#wd_lifecheck_method = 'heartbeat'
# Method of watchdog lifecheck ('heartbeat' or 'query')
# (change requires restart)
wd_lifecheck_method = 'heartbeat'
wd_interval = 2
# lifecheck interval (sec) > 0
# (change requires restart)
# -- heartbeat mode --
wd_heartbeat_port = 9694
# Port number for receiving heartbeat signal
# (change requires restart)
wd_heartbeat_keepalive = 2
# Interval time of sending heartbeat signal (sec)
# (change requires restart)
wd_heartbeat_deadtime = 30
# Deadtime interval for heartbeat signal (sec)
# (change requires restart)
heartbeat_destination0 = '192.168.91.32'
# Host name or IP address of destination 0
# for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
heartbeat_device0 = 'eth0'
# Name of NIC device (such like 'eth0')
# used for sending/receiving heartbeat
# signal to/from destination 0.
# This works only when this is not empty
# and pgpool has root privilege.
# (change requires restart)
#heartbeat_destination1 = 'host0_ip2'
#heartbeat_destination_port1 = 9694
#heartbeat_device1 = ''
# -- query mode --
wd_life_point = 3
# lifecheck retry times
# (change requires restart)
wd_lifecheck_query = 'SELECT 1'
# lifecheck query to pgpool from watchdog
# (change requires restart)
wd_lifecheck_dbname = 'template1'
# Database name connected for lifecheck
# (change requires restart)
wd_lifecheck_user = 'postgres'
# watchdog user monitoring pgpools in lifecheck
# (change requires restart)
wd_lifecheck_password = ''
# Password for watchdog user in lifecheck
# (change requires restart)
# - Other pgpool Connection Settings -
other_pgpool_hostname0 = '192.168.91.32'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for othet pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for othet watchdog 0
# (change requires restart)
#other_pgpool_hostname1 = 'host1'
#other_pgpool_port1 = 5432
#other_wd_port1 = 9000
#------------------------------------------------------------------------------
# OTHERS
#------------------------------------------------------------------------------
relcache_expire = 0
# Life time of relation cache in seconds.
# 0 means no cache expiration(the default).
# The relation cache is used for cache the
# query result against PostgreSQL system
# catalog to obtain various information
# including table structures or if it's a
# temporary table or not. The cache is
# maintained in a pgpool child local memory
# and being kept as long as it survives.
# If someone modify the table by using
# ALTER TABLE or some such, the relcache is
# not consistent anymore.
# For this purpose, cache_expiration
# controls the life time of the cache.
relcache_size = 256
# Number of relation cache
# entry. If you see frequently:
# "pool_search_relcache: cache replacement happend"
# in the pgpool log, you might want to increate this number.
check_temp_table = on
# If on, enable temporary table check in SELECT statements.
# This initiates queries against system catalog of primary/master
# thus increases load of master.
# If you are absolutely sure that your system never uses temporary tables
# and you want to save access to primary/master, you could turn this off.
# Default is on.
#------------------------------------------------------------------------------
# ON MEMORY QUERY MEMORY CACHE
#------------------------------------------------------------------------------
memory_cache_enabled = off
# If on, use the memory cache functionality, off by default
memqcache_method = 'shmem'
# Cache storage method. either 'shmem'(shared memory) or
# 'memcached'. 'shmem' by default
# (change requires restart)
memqcache_memcached_host = 'localhost'
# Memcached host name or IP address. Mandatory if
# memqcache_method = 'memcached'.
# Defaults to localhost.
# (change requires restart)
memqcache_memcached_port = 11211
# Memcached port number. Mondatory if memqcache_method = 'memcached'.
# Defaults to 11211.
# (change requires restart)
memqcache_total_size = 67108864
# Total memory size in bytes for storing memory cache.
# Mandatory if memqcache_method = 'shmem'.
# Defaults to 64MB.
# (change requires restart)
memqcache_max_num_cache = 1000000
# Total number of cache entries. Mandatory
# if memqcache_method = 'shmem'.
# Each cache entry consumes 48 bytes on shared memory.
# Defaults to 1,000,000(45.8MB).
# (change requires restart)
memqcache_expire = 0
# Memory cache entry life time specified in seconds.
# 0 means infinite life time. 0 by default.
# (change requires restart)
memqcache_auto_cache_invalidation = on
# If on, invalidation of query cache is triggered by corresponding
# DDL/DML/DCL(and memqcache_expire). If off, it is only triggered
# by memqcache_expire. on by default.
# (change requires restart)
memqcache_maxcache = 409600
# Maximum SELECT result size in bytes.
# Must be smaller than memqcache_cache_block_size. Defaults to 400KB.
# (change requires restart)
memqcache_cache_block_size = 1048576
# Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.
# Defaults to 1MB.
# (change requires restart)
memqcache_oiddir = '/var/log/pgpool/oiddir'
# Temporary work directory to record table oids
# (change requires restart)
white_memqcache_table_list = ''
# Comma separated list of table names to memcache
# that don't write to database
# Regexp are accepted
black_memqcache_table_list = ''
# Comma separated list of table names not to memcache
# that don't write to database
# Regexp are accepted
Also, when I use load balance mode the load balancing is made between connections and not queries, right?
I mean that if i have 1 script the open one connection and perform 1000 selects, these selects will be performed on one node.
Regards,
Adrian Videanu
________________________________
From: Yugo Nagata <nagata at sraoss.co.jp>
To: Videanu Adrian <videanuadrian at yahoo.com>
Cc: pgpool-general at pgpool.net
Sent: Thursday, November 7, 2013 12:44 PM
Subject: Re: [pgpool-general: 2237] pgpool 3.3.1 replicate setup with 3 nodes
Hi,
Are you setting recovery_2nd_stage_command properly?
(http://www.pgpool.net/docs/latest/pgpool-en.html#online-recovery
http://www.pgpool.net/docs/latest/pgpool-en.html#RECOVERY_2ND_STAGE_COMMAND)
If this is wrong, recovery might fails and backends might get inconsistent.
For analysis, could you please provide your pgpool.conf and all scripts?
On Wed, 6 Nov 2013 06:08:05 -0800 (PST)
Videanu Adrian <videanuadrian at yahoo.com> wrote:
> Hi all,
>
> I have a pgpool 3.3.1 with 3 postgresql 9.2 backends.
> The pgpool is configured as : [ mode ]
> Replication Mode
> / Load Balance Mode / Watchdog On
>
> I have a problem when I use online recovery for one of the nodes.
>
> Scenario:
> 1. The 3rd node failed.
> 2. Pgpool detects the failure and present the node as down. (everything is ok)
> 3. I press recovery from pgpadmin and the recovery script starts.
> 4. The 3rd node is back online, but the sequence for this node and the 1st node are incremented by 1 regarding the 2nd node.
> Because of this nothing works because i get a kind mismatch between nodes.
>
> Does replication setup works for more than 2 nodes and if yes how should i perform online recovery.
>
> PS. I use PITR.
>
>
>
> Regards,
> Adrian Videanu
--
Yugo Nagata <nagata at sraoss.co.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20131107/3ecf8116/attachment.htm>
More information about the pgpool-general
mailing list