[Pgpool-general] problem with healthcheck

Fri Feb 4 14:36:54 UTC 2011

Hi!

after more reading and trying out I figured out how to set up everything.

Here is brief description of steps I have done:

1. Online recovery must be configured as described in
http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html#online-recovery
(I use PostgreSQL 8.1)
2. pcp_recovery_node command must be used to kick off recovery process

Currently I have environment with two PSQL nodes, when I simulate
failure of one of them I can execute pcp_recovery_node command and
failed node is brought back to life and data is synchronized properly.

I have however two questions regarding behavior I noticed:

1. When a client is connected to pgpool process and executes some
queries while failure occurs (or even if he doesn't execute any
queries but is just connected), client is disconnected and must
connect  again. When a client reconnects, queries are properly
executed.

This is little strange for me, I was expecting that pgpool will be
able to failover from backend0 to backend1 transparently without any
impact on the client.

Perhaps I'm doing something wrong here?

2. Second question is about recovery operation. It seems like 2nd
stage of recovery cannot be performed if any client is connected to
pgpool. When a client is connected and I trigger pcp_recovery_node, it
hangs on 2nd stage until I disconnect. When I disconnect, recovery
continues and completes properly.

Is there something that can be done not to require client to
disconnect to continue with recovery?

best regards,

Michal

On Thu, Feb 3, 2011 at 11:24 PM, Guillaume Lelarge
<guillaume at lelarge.info> wrote:
> Le 03/02/2011 16:52, Michal Slocinski a écrit :
>> Hi!
>>
>> I'm pgpool newbie trying to setup cluster of two PostgreSQL servers.
>>
>> I configureed pgpool.conf as follows:
>>
>> # $Header: /cvsroot/pgpool/pgpool-II/pgpool.conf.sample-replication,v
>> 1.11 2010/09/01 04:58:47 kitagawa Exp $
>> listen_addresses = '*'
>> port = 9999
>> pcp_port = 9898
>> socket_dir = '/tmp'
>> pcp_socket_dir = '/tmp'
>> backend_socket_dir = '/tmp'
>> pcp_timeout = 10
>> num_init_children = 32
>> max_pool = 4
>> child_life_time = 300
>> connection_life_time = 0
>> child_max_connections = 0
>> client_idle_limit = 0
>> authentication_timeout = 60
>> logdir = '/tmp'
>> pid_file_name = '/var/run/pgpool/pgpool.pid'
>> replication_mode = true
>> load_balance_mode = false
>> replication_stop_on_mismatch = false
>> failover_if_affected_tuples_mismatch = false
>> replicate_select = false
>> reset_query_list = 'ABORT; DISCARD ALL'
>> white_function_list = ''
>> black_function_list = 'nextval,setval'
>> print_timestamp = true
>> master_slave_mode = false
>> master_slave_sub_mode = 'slony'
>> delay_threshold = 0
>> log_standby_delay = 'none'
>> connection_cache = true
>> health_check_timeout = 20
>> health_check_period = 1
>> health_check_user = 'nobody'
>> failover_command = ''
>> failback_command = ''
>> fail_over_on_backend_error = true
>> insert_lock = true
>> ignore_leading_white_space = true
>> log_statement = false
>> log_per_node_statement = false
>> log_connections = false
>> log_hostname = false
>> parallel_mode = false
>> enable_query_cache = false
>> pgpool2_hostname = ''
>> backend_hostname0 = '172.16.2.72'
>> backend_port0 = 5432
>> backend_weight0 = 1
>> backend_data_directory0 = '/var/lib/pgsql/data'
>> backend_hostname1 = '172.16.2.73'
>> backend_port1 = 5432
>> backend_weight1 = 1
>> backend_data_directory1 = '/var/lib/pgsql/data'
>> enable_pool_hba = true
>> recovery_user = 'nobody'
>> recovery_password = ''
>> recovery_1st_stage_command = ''
>> recovery_2nd_stage_command = ''
>> recovery_timeout = 90
>> client_idle_limit_in_recovery = 0
>> lobj_lock_table = ''
>> ssl = false
>> debug_level = 1
>>
>>
>> Problem I have is that pgpool seems to have problem with checking
>> status of PostgreSQL instances.
>>
>> For example, I start both PostgreSQL servers and start pgpool, pgpool
>> sees both of them as active and I can connect to pgpool and execute
>> query - everything fine.
>>
>> 2011-02-03 16:50:56 DEBUG: pid 22649: starting health checking
>> 2011-02-03 16:50:56 DEBUG: pid 22649: health_check: 0 th DB node status: 1
>> 2011-02-03 16:50:56 DEBUG: pid 22649: health_check: 1 th DB node status: 1
>>
>> Now, I shutdown one of servers and pgpool correctly discovers that one
>> of them is down.
>>
>> 2011-02-03 16:50:56 DEBUG: pid 22649: starting health checking
>> 2011-02-03 16:50:56 DEBUG: pid 22649: health_check: 0 th DB node status: 1
>> 2011-02-03 16:50:56 DEBUG: pid 22649: health_check: 1 th DB node status: 3
>>
>> However, when I bring it up again, pgpool seems to not be able to
>> detect that node is back
>>
>> 2011-02-03 16:50:56 DEBUG: pid 22649: starting health checking
>> 2011-02-03 16:50:56 DEBUG: pid 22649: health_check: 0 th DB node status: 1
>> 2011-02-03 16:50:56 DEBUG: pid 22649: health_check: 1 th DB node status: 3
>>
>
> You are in replication mode. So, if you stopped node 1, node 0 continues
> to work, but the node 1 is stopped and cannot be kept current. So, when
> you start again the node 1, it can't be use on replication because there
> could be a mismatch of data between them.
>
> So, before restarting node 1, you need to rebuild it with node 0 datas.
> I'm not much more familiar with the actions involved, but I'm sure it's
> normal for node 1 to not get reattached automatically.
>
>
> --
> Guillaume
>  http://www.postgresql.fr
>  http://dalibo.com
>

-- 

Michal Slocinski
michal.slocinski at gmail.com