[Pgpool-general] How to set up PGPool II Recovery

Wed Jul 9 09:02:34 UTC 2008

Hi,

I'm using PGPOOL II on a system with 2 postgreSQL backend :

         backend1                      backend2
         192.168.0.114                 192.168.0.212
         5432                          5432
                    \                 /
                     \               /
                      \             /
                      PGPool II 2.1 RC1
                      192.168.0.114
                      9999
                      pcp port : 9998

Yes, pgpool is on the same host than backend1.
I set the replication which seems to works perfectly well. I'm now
trying to set up online recovery. When I start pgpool with -n -d
options, I can see the healthcheck which detects when my banckend2 stops
to work. It prints smthg like :

2008-07-09 08:52:18 DEBUG: pid 12406: starting health checking
2008-07-09 08:52:18 DEBUG: pid 12406: health_check: 0 the DB node
status: 2
2008-07-09 08:52:18 DEBUG: pid 12406: health_check: 1 the DB node
status: 3

I guess the state '3' is when the backend failed to answer. Anyway, when
the backend went back, the state doesn't change.

Even after reading the informations from mailing list, I can't
understant what's wrong.

- The databases are in /var/lib/postgresql/8.3/main/
- The postgres user's home is /var/lib/postgres

Here are my files on 192.168.0.114

=== /usr/local/etc/pgpool.conf ===
listen_addresses = 'localhost'

port = 9999

pcp_port = 9898

socket_dir = '/tmp'

pcp_socket_dir = '/tmp'

backend_socket_dir = '/tmp'

pcp_timeout = 10

num_init_children = 15

max_pool = 1

child_life_time = 0

connection_life_time = 5

child_max_connections = 0

client_idle_limit = 0

authentication_timeout = 5

logdir = '/tmp'

replication_mode = true

load_balance_mode = false

replication_stop_on_mismatch = false

replicate_select = true

reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'

print_timestamp = true

master_slave_mode = false

connection_cache = false

health_check_timeout = 2

health_check_period = 2

health_check_user = 'postgres'

failover_command = 'touch /tmp/failed'

failback_command = 'touch /tmp/back'

insert_lock = false 

ignore_leading_white_space = true

log_statement = true 

log_connections = true

log_hostname = false

parallel_mode = false 

enable_query_cache = false 

pgpool2_hostname = ''

system_db_hostname = 'localhost'
system_db_port = 5432
system_db_dbname = 'pgpool'
system_db_schema = 'pgpool_catalog'
system_db_user = 'pgpool'
system_db_password = ''

backend_hostname0 = 'localhost'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = 'data'
backend_hostname1 = '192.168.0.212'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = 'data'

enable_pool_hba = false

recovery_user = 'postgres'

recovery_password = ''

recovery_1st_stage_command = 'base-backup.sh'

recovery_2nd_stage_command = 'pgpool-recovery'

recovery_timeout = 10

==========================================
=== /var/lib/postgresql/base-backup.sh ===

#! /bin/sh
DATADIR=/var/lib/postgresql/8.3
MASTER=localhost
SLAVE=192.168.0.212
psql -c "select pg_start_backup('pgpool-recovery')" postgres
echo "restore_command = 'scp -P 220 $DATADIR/data/archive_log/%f %p'" > $DATADIR/data/recovery.conf
tar -C $DATADIR/data -zcf pgsql.tar.gz pgsql
psql -c 'select pg_stop_backup()' postgres
scp -P 220 pgsql.tar.gz $SLAVE:$DATADIR/data

============================================
=== /var/lib/postgresql/pgpool-recovery ====

#!/bin/bash
# Archive a current xlog.
psql -c 'select pg_switch_xlog()' postgres

============================================

I made the make install on sql/pgpool-recovery/
 and the  psql -f pgpool-recovery.sql template1
on both backends

my /var/lib/postgresql/ contains :

drwxr-xr-x 5 postgres postgres 4096 2008-07-08 15:53 8.3
-rwxr-xr-x 1 root     root      374 2008-07-09 07:46 base-backup.sh
drwxr-xr-x 2 root     root     4096 2008-07-09 08:28 data
-rwxr-xr-x 1 root     root       82 2008-07-08 07:47 pgpool-recovery

The failover_command (touch /tmp/failed) is correctly executed when I stop/disconnect the 192.168.0.212 backend
but the failback command is never called.

What's wrong ?

Thanks in advance,

Maxence

--
Maxence DUNNEWIND
Contact : maxence at dunnewind.net
Site : http://www.dunnewind.net
02 23 20 35 36
06 32 39 39 93
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://pgfoundry.org/pipermail/pgpool-general/attachments/20080709/03d4ea62/attachment.bin