[Pgpool-general] client_ilde_limit_in_recovery

Wed Nov 12 00:39:26 UTC 2008

Hi Tatsuo, thanks for the patch provided that is awesome.
I applied the patch on the CVS version I checkout again. Then I  
compiled pgpool and installed and all that went ok.
Not sure if I mentioned this before but I'm using debian etch with  
postgres 8.3.4

I tried to be as clear as I could but if you any questions please let  
me know.

pgpool server = debian-db1

=====================
Test #1: Applying patch
=====================

Started pgpool on debug mode and then created a psql connection from  
my box to the pgpool server.

psql -h debian-db1 -U postgres -d test
test=#  \l
...
test=#

Basically I logged in, listed the databases and then just left the  
psql client connection sit there in idle.

Then I called the pcp_recovery_node command to bring node 2 back online.

debian-db1:/opt/pgpool-cvs/bin# ./pcp_recovery_node 10  localhost 9898  
nastpcp nastpcp 2

First stage working fine but then got stuck on stage 2.

The new entry that was added so that we could see some debug info did  
not show up in the logs at any point.

My first impression was that perhaps the "if statement" that is found  
on line 375 of pool_process_query.c might be resulting to false and  
therefore the
code within that scope is not running.

--------------------------------------------------------------------------------------------------------------------------------------------------------------
   if (((*InRecovery == 0 && pool_config->client_idle_limit > 0) ||
                                  (*InRecovery && pool_config- 
 >client_idle_limit_in_recovery > 0)) && fds == 0)
--------------------------------------------------------------------------------------------------------------------------------------------------------------

So I decided to added a couple more calls to pool_debug in the  
pool_process_query.c  file and proceeded with the test below.

==========================================
Test #2: Adding additional pool_debug calls
==========================================

debian-db1:/usr/local/src/CVS/pgpool-II-2.1.cvs/pgpool-II# diff  
pool_process_query.c pool_process_query.c.orig
--------------------------------------------------------------------------------------------------------------------------------------------------------------
375,378d374
<                         pool_debug(" outside: InRecovery:%d  
client_idle_limit:%d client_idle_limit_in_recovery:%d",
<                                    pool_config->client_idle_limit,  
pool_config->client_idle_limit_in_recovery);
<
<
383,387d378
<
<                                       pool_debug(" loop1: idle count: 
%d InRecovery:%d client_idle_limit:%d client_idle_limit_in_recovery:%d",
<                                                     
frontend_idle_count, pool_config->client_idle_limit,
<                                                    pool_config- 
 >client_idle_limit_in_recovery);
<
396,399d386
<                                      pool_debug(" loop2: idle count: 
%d InRecovery:%d client_idle_limit:%d client_idle_limit_in_recovery:%d",
<                                                    
frontend_idle_count, pool_config->client_idle_limit,
<                                                             
pool_config->client_idle_limit_in_recovery);
<
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Then I compiled pgpool and installed it as before.

As before I started pgpool on debug mode and then created a psql  
connection from my box to the pgpool server.

psql -h debian-db1 -U postgres -d test
test=#  \l
...
test=#

So, I logged in and then listed the databases.
Once the "\l" command was executed the following showed up in the log:

--------------------------------------------------------------------------------------------------------------------------------------------------------------
2008-11-11 18:18:59 DEBUG: pid 2490:  outside: InRecovery:0  
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:18:59 DEBUG: pid 2490: read kind from frontend Q(51)
2008-11-11 18:18:59 DEBUG: pid 2490: statement2: SELECT d.datname as  
"Name",
        r.rolname as "Owner",
        pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding"
FROM pg_catalog.pg_database d
   JOIN pg_catalog.pg_roles r ON d.datdba = r.oid
ORDER BY 1;
2008-11-11 18:18:59 DEBUG: pid 2490: waiting for backend 1 completing  
the query
2008-11-11 18:18:59 DEBUG: pid 2490:  outside: InRecovery:0  
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend T NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
backend: T
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
backend: D
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
backend: D
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
backend: D
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
backend: D
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend C NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
backend: C
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend Z NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
backend: Z
2008-11-11 18:18:59 DEBUG: pid 2490: pool_read_message_length: slot: 1  
length: 5
2008-11-11 18:18:59 DEBUG: pid 2490: ReadyForQuery: message length: 5
2008-11-11 18:18:59 DEBUG: pid 2490: ReadyForQuery: transaction state: I
2008-11-11 18:18:59 DEBUG: pid 2490: end_load_balance: end load  
balance mode
2008-11-11 18:19:10 DEBUG: pid 2421: starting health checking
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 0 the DB node  
status: 2
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 1 the DB node  
status: 2
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 2 the DB node  
status: 3
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Then I just let the connection sit there idle without doing anything  
on the psql console for a couple of minutes.
Then I proceeded to performing a recovery as shown below

Called the pcp_recovery_node command to bring node 2 back online.
debian-db1:/opt/pgpool-cvs/bin# ./pcp_recovery_node 10  localhost 9898  
nastpcp nastpcp 2

--------------------------------------------------------------------------------------------------------------------------------------------------------------
2008-11-11 18:19:10 DEBUG: pid 2421: starting health checking
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 0 the DB node  
status: 2
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 1 the DB node  
status: 2
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 2 the DB node  
status: 3
2008-11-11 18:19:50 DEBUG: pid 2421: starting health checking
2008-11-11 18:19:50 DEBUG: pid 2421: health_check: 0 the DB node  
status: 2
2008-11-11 18:19:50 DEBUG: pid 2421: health_check: 1 the DB node  
status: 2
2008-11-11 18:19:50 DEBUG: pid 2421: health_check: 2 the DB node  
status: 3
2008-11-11 18:20:30 DEBUG: pid 2421: starting health checking
2008-11-11 18:20:30 DEBUG: pid 2421: health_check: 0 the DB node  
status: 2
2008-11-11 18:20:30 DEBUG: pid 2421: health_check: 1 the DB node  
status: 2
2008-11-11 18:20:31 DEBUG: pid 2421: health_check: 2 the DB node  
status: 3
2008-11-11 18:21:11 DEBUG: pid 2421: starting health checking
2008-11-11 18:21:11 DEBUG: pid 2421: health_check: 0 the DB node  
status: 2
2008-11-11 18:21:11 DEBUG: pid 2421: health_check: 1 the DB node  
status: 2
2008-11-11 18:21:11 DEBUG: pid 2421: health_check: 2 the DB node  
status: 3
2008-11-11 18:21:27 DEBUG: pid 2454: I am PCP 2454 accept fd 5
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: received PCP packet  
type of service 'M'
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: salt sent to the client
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: received PCP packet  
type of service 'R'
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: authentication OK
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: received PCP packet  
type of service 'O'
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: start online recovery
2008-11-11 18:21:27 LOG:   pid 2454: starting recovering node 2
2008-11-11 18:21:27 DEBUG: pid 2454: exec_checkpoint: start checkpoint
2008-11-11 18:21:27 DEBUG: pid 2454: exec_checkpoint: finish checkpoint
2008-11-11 18:21:27 LOG:   pid 2454: CHECKPOINT in the 1st stage done
2008-11-11 18:21:27 LOG:   pid 2454: starting recovery command:  
"SELECT pgpool_recovery('copy-base-backup', 'debian-db4', '/var/lib/ 
postgresql/8.3/main')"
2008-11-11 18:21:27 DEBUG: pid 2454: exec_recovery: start recovery
2008-11-11 18:21:29 DEBUG: pid 2454: exec_recovery: finish recovery
2008-11-11 18:21:29 LOG:   pid 2454: 1st stage is done
2008-11-11 18:21:29 LOG:   pid 2454: starting 2nd stage
--------------------------------------------------------------------------------------------------------------------------------------------------------------

So, stage 2 just hang there for a few seconds not doing anything.

Then I decided to list the databases again with "\l" on the psql  
connection that I already had opened on the beginning of the test.
Once that happened it seems like pgpool triggered the client  
connection to be reset and then 2nd stage proceeded.

--------------------------------------------------------------------------------------------------------------------------------------------------------------
2008-11-11 18:21:51 DEBUG: pid 2421: starting health checking
2008-11-11 18:22:07 DEBUG: pid 2490:  outside: InRecovery:0  
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:22:07 DEBUG: pid 2490: read kind from frontend Q(51)
2008-11-11 18:22:07 DEBUG: pid 2490: statement2: SELECT d.datname as  
"Name",
        r.rolname as "Owner",
        pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding"
FROM pg_catalog.pg_database d
   JOIN pg_catalog.pg_roles r ON d.datdba = r.oid
ORDER BY 1;
2008-11-11 18:22:07 DEBUG: pid 2490: waiting for backend 1 completing  
the query
2008-11-11 18:22:07 DEBUG: pid 2490:  outside: InRecovery:0  
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend T NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
backend: T
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
backend: D
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
backend: D
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
backend: D
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
backend: D
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend C NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
backend: C
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend Z NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
backend: Z
2008-11-11 18:22:07 DEBUG: pid 2490: pool_read_message_length: slot: 1  
length: 5
2008-11-11 18:22:07 DEBUG: pid 2490: ReadyForQuery: message length: 5
2008-11-11 18:22:07 DEBUG: pid 2490: ReadyForQuery: transaction state: I
2008-11-11 18:22:07 DEBUG: pid 2490: end_load_balance: end load  
balance mode
2008-11-11 18:22:08 DEBUG: pid 2490:  outside: InRecovery:0  
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:22:08 DEBUG: pid 2490:  loop1: idle count:1 InRecovery:0  
client_idle_limit:1 client_idle_limit_in_recovery:-1080004248
2008-11-11 18:22:09 DEBUG: pid 2490:  outside: InRecovery:0  
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:22:09 DEBUG: pid 2490:  loop1: idle count:2 InRecovery:0  
client_idle_limit:1 client_idle_limit_in_recovery:-1080004248
2008-11-11 18:22:09 DEBUG: pid 2490:  loop2: idle count:2 InRecovery:0  
client_idle_limit:1 client_idle_limit_in_recovery:-1080004248
2008-11-11 18:22:09 LOG:   pid 2490: pool_process_query: child  
connection forced to terminate due to client_idle_limit_in_recovery(1)  
reached
2008-11-11 18:22:09 DEBUG: pid 2490: statement2:  RESET ALL
2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 0 completing  
the query
2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 1 completing  
the query
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 0 th backend C NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend C NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
backend: C
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 0 th backend Z NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend Z NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
backend: Z
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 0  
length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 1  
length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: message length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: transaction state: I
2008-11-11 18:22:09 DEBUG: pid 2490: statement2:  SET SESSION  
AUTHORIZATION DEFAULT
2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 0 completing  
the query
2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 1 completing  
the query
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 0 th backend S NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend S NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
backend: S
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master  
slot: 0 length: 20
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master  
slot: 1 length: 20
2008-11-11 18:22:09 DEBUG: pid 2490: 0 th backend: name: is_superuser  
value: on
2008-11-11 18:22:09 DEBUG: pid 2490: 1 th backend: name: is_superuser  
value: on
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 0 th backend S NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend S NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
backend: S
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master  
slot: 0 length: 35
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master  
slot: 1 length: 35
2008-11-11 18:22:09 DEBUG: pid 2490: 0 th backend: name:  
session_authorization value: postgres
2008-11-11 18:22:09 DEBUG: pid 2490: 1 th backend: name:  
session_authorization value: postgres
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 0 th backend C NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend C NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
backend: C
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 0 th backend Z NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
from 1 th backend Z NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
backend: Z
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 0  
length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 1  
length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: message length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: transaction state: I
2008-11-11 18:22:09 DEBUG: pid 2490: pool_connection_pool_timer: set  
close time 1226449329
2008-11-11 18:22:11 LOG:   pid 2454: all connections from clients have  
been closed
2008-11-11 18:22:11 DEBUG: pid 2454: exec_checkpoint: start checkpoint
2008-11-11 18:22:11 DEBUG: pid 2454: exec_checkpoint: finish checkpoint
2008-11-11 18:22:11 LOG:   pid 2454: CHECKPOINT in the 2nd stage done
2008-11-11 18:22:11 LOG:   pid 2454: starting recovery command:  
"SELECT pgpool_recovery('pgpool_recovery_pitr', 'debian-db4', '/var/ 
lib/postgresql/8.3/main')"
2008-11-11 18:22:11 DEBUG: pid 2454: exec_recovery: start recovery
2008-11-11 18:22:11 DEBUG: pid 2454: exec_recovery: finish recovery
2008-11-11 18:22:11 DEBUG: pid 2454: exec_remote_start: start  
pgpool_remote_start
2008-11-11 18:22:16 DEBUG: pid 2454: exec_remote_start: finish  
pgpool_remote_start
2008-11-11 18:22:19 LOG:   pid 2454: 2 node restarted
2008-11-11 18:22:19 LOG:   pid 2454: send_failback_request: fail back  
2 th node request from pid 2454
2008-11-11 18:22:19 LOG:   pid 2454: recovery done
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler called
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: starting to  
select new master node
2008-11-11 18:22:19 LOG:   pid 2421: starting fail back. reconnect  
host debian-db4(5432)
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2459
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2460
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2461
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2462
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2463
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2464
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2465
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2466
.....

--------------------------------------------------------------------------------------------------------------------------------------------------------------

If I had not done so the "pcp_recovery_node " command would have  
failed giving back the BackendError message and the log would show

--------------------------------------------------------------------------------------------------------------------------------------------------------------
2008-11-11 18:31:09 LOG:   pid 2454: starting 2nd stage
2008-11-11 18:31:30 DEBUG: pid 2421: starting health checking
2008-11-11 18:32:10 DEBUG: pid 2421: starting health checking
2008-11-11 18:32:43 ERROR: pid 2454: wait_connection_closed: existing  
connections did not close in 90 sec.
2008-11-11 18:32:43 ERROR: pid 2454: start_recovery: timeover for  
waiting connection closed
--------------------------------------------------------------------------------------------------------------------------------------------------------------

It just seems weird to me that pgpool only kills the client connection  
and proceeds with stage 2  if I execute a command within the psql  
console that I
had opened before the online recovery started. If I had  
client_idle_limit set to 30s the connection would have been killed  
without me having to do anything as far
as I can remember.

PGPOOL.CONF:
--------------------------------------------------------------------------------------------------------------------------------------------------------------
listen_addresses = '*'
port = 5432
pcp_port = 9898
socket_dir = '/var/run/postgresql'
pcp_socket_dir = '/var/run/postgresql'
backend_socket_dir = '/var/run/postgresql'
pcp_timeout = 10
num_init_children = 32
max_pool = 1
child_life_time = 150
connection_life_time = 0
child_max_connections = 0
client_idle_limit = 0
authentication_timeout = 60
logdir = '/var/run/postgresql'
replication_mode = true
load_balance_mode = true
replication_stop_on_mismatch = false
replicate_select = false
reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
print_timestamp = true
master_slave_mode = false
connection_cache = true
health_check_timeout = 20
health_check_period = 40
health_check_user = 'postgres'
failover_command = ''
failback_command = ''
insert_lock = false
ignore_leading_white_space = true
log_statement = false
log_connections = false
log_hostname = false
parallel_mode = false
enable_query_cache = false
pgpool2_hostname = 'debian-db1'
backend_hostname0 = 'debian-db2'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/8.3/main'
backend_hostname1 = 'debian-db3'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/8.3/main'
backend_hostname2 = 'debian-db4'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/lib/postgresql/8.3/main'
enable_pool_hba = false
recovery_user = 'postgres'
recovery_password = ''
recovery_1st_stage_command = 'copy-base-backup'
recovery_2nd_stage_command = 'pgpool_recovery_pitr'
recovery_timeout = 90
client_idle_limit_in_recovery = 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Thanks for all your help on this
-
Marcelo

On Nov 10, 2008, at 6:13 PM, Tatsuo Ishii wrote:

> Marcelo,
>
> Could you apply follwing patches and try again so that we could more
> debug info?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
>> hmm unfortunately it does not seem to be working for me or perhaps I
>> have a setting that you may not have turned on Tiago.
>>
>> Basically I created one connection from my apache (10.1.101.152)  
>> going
>> to pgpool and then let it set idle.
>> pgpool has two connections open to the backends 0 and 1 and I'm  
>> trying
>> to recover node 2.
>>
>> It pretty much hangs on staging 2. On pgpool start up I do see that
>> the new setting is indeed turned on.
>> What I think it's strange is that on the pgpool LOG I don't even see
>> it trying to kill the clients by checking the
>> client_idle_limit_in_recovery setting
>>
>>
>> Could one of these settings be the problem ?
>>
>> num_init_children = 70
>> max_pool = 1
>> child_life_time = 300
>> connection_life_time = 300
>> child_max_connections = 0
>> client_idle_limit = 0
>> authentication_timeout = 60
>> replication_mode = true
>> load_balance_mode = true
>> replication_stop_on_mismatch = false
>> connection_cache = true
>> health_check_timeout = 20
>> health_check_period = 15
>> health_check_user = 'postgres'
>> recovery_user = 'postgres'
>> recovery_password = ''
>> recovery_1st_stage_command = 'copy-base-backup'
>> recovery_2nd_stage_command = 'pgpool_recovery_pitr'
>> recovery_timeout = 90
>> client_idle_limit_in_recovery = 10
>>
>>
>>
>>
>> LOG:
>> -----
>> Nov 10 12:29:59 debian-db1 pgpool: 2008-11-10 12:29:59 LOG:   pid
>> 30668: pgpool successfully started
>> Nov 10 12:29:59 debian-db1 pgpool: 2008-11-10 12:29:59 LOG:   pid
>> 30668: set 2 th backend down status
>> Nov 10 12:29:59 debian-db1 pgpool: 2008-11-10 12:29:59 LOG:   pid
>> 30668: starting degeneration. shutdown host debian-db4(5432)
>> Nov 10 12:30:00 debian-db1 pgpool: 2008-11-10 12:30:00 LOG:   pid
>> 30668: failover_handler: set new master node: 0
>> Nov 10 12:30:01 debian-db1 pgpool: 2008-11-10 12:30:01 LOG:   pid
>> 30668: failover done. shutdown host debian-db4(5432)
>> Nov 10 12:30:02 debian-db1 pgpool: 2008-11-10 12:30:02 LOG:   pid
>> 30821: ProcessFrontendResponse: failed to read kind from frontend.
>> frontend abnormally exited
>> Nov 10 12:34:07 debian-db1 pgpool: 2008-11-10 12:34:07 LOG:   pid
>> 30741: starting recovering node 2
>> Nov 10 12:34:08 debian-db1 pgpool: 2008-11-10 12:34:08 LOG:   pid
>> 30741: CHECKPOINT in the 1st stage done
>> Nov 10 12:34:08 debian-db1 pgpool: 2008-11-10 12:34:08 LOG:   pid
>> 30741: starting recovery command: "SELECT pgpool_recovery('copy-base-
>> backup', 'debian-db4', '/var/lib/postgresql/8.3/main')"
>> Nov 10 12:34:09 debian-db1 pgpool: 2008-11-10 12:34:09 LOG:   pid
>> 30741: 1st stage is done
>> Nov 10 12:34:09 debian-db1 pgpool: 2008-11-10 12:34:09 LOG:   pid
>> 30741: starting 2nd stage
>> Nov 10 12:34:17 debian-db1 pgpool: 2008-11-10 12:34:17 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:34:32 debian-db1 pgpool: 2008-11-10 12:34:32 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:34:47 debian-db1 pgpool: 2008-11-10 12:34:47 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:35:02 debian-db1 pgpool: 2008-11-10 12:35:02 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:35:17 debian-db1 pgpool: 2008-11-10 12:35:17 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:35:32 debian-db1 pgpool: 2008-11-10 12:35:32 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 ERROR: pid
>> 30741: wait_connection_closed: existing connections did not close in
>> 90 sec.
>> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 ERROR: pid
>> 30741: start_recovery: timeover for waiting connection closed
>> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 DEBUG: pid
>> 30741: pcp_child: received PCP packet type of service 'X'
>> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 DEBUG: pid
>> 30741: pcp_child: client disconnecting. close connection
>>
>>
>>
>>
>> NETSTAT
>> -------------
>> tcp        0      0 0.0.0.0:5432            0.0.0.0:*
>> LISTEN     30668/pgpool
>> tcp        0      0 10.1.100.213:5432       10.1.101.152:2936
>> ESTABLISHED30821/pgpool:
>> tcp        0      0 10.1.100.213:1947       10.1.100.217:5432
>> ESTABLISHED30821/pgpool:
>> tcp        1      0 127.0.0.1:4652          127.0.0.1:5432
>> CLOSE_WAIT 1973/python
>> tcp        0      0 10.1.100.213:4814       10.1.100.116:5432
>> ESTABLISHED30821/pgpool:
>>
>>
>>
>>
>> Thank you,
>> Marcelo
>>
>>
>> On Nov 8, 2008, at 9:05 AM, Tiago Macedo wrote:
>>
>>> Hi,
>>>
>>> I've tested this and it works perfectly. Exactly what I needed.
>>>
>>> Thank you so much for your hard work,
>>> Tiago Macedo
>>>
>>> On Fri, Nov 7, 2008 at 9:28 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
>>> wrote:
>>> Hi,
>>>
>>> I have implemented new directive "client_ilde_limit_in_recovery" per
>>> discussion. This is usefull for on line recovery. From the doc:
>>>
>>>   client_ilde_limit_in_recovery
>>>
>>>          Similar to client_idle_limit but only takes effect in
>>> recovery 2nd
>>>          stage. Disconnect the connection to a client being idle for
>>>          client_idle_limit_in_recovery seconds since the last query
>>> has
>>>          been sent.  This is usefull for preventing for pgpool
>>> recovery
>>>          disturbed by a lazy client or TCP/IP connection between
>>> client and
>>>          pgpool is accidentally down. The default value for
>>>          client_idle_limit_in_recovery is 0, which means the
>>> functionality is turned
>>>          off. You need to reload pgpool.conf if you change
>>>          client_idle_limit_in_recovery.
>>>
>>> Note that now client_idle_limit only takes effect *other than* in  
>>> the
>>> second stage of recovery. I think this makes more sense. Please  
>>> let me
>>> know if you have any questions/comments.
>>>
>>> Thanks,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> _______________________________________________
>>> Pgpool-general mailing list
>>> Pgpool-general at pgfoundry.org
>>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>>>
>>> _______________________________________________
>>> Pgpool-general mailing list
>>> Pgpool-general at pgfoundry.org
>>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>>
> Index: pool_process_query.c
> ===================================================================
> RCS file: /cvsroot/pgpool/pgpool-II/pool_process_query.c,v
> retrieving revision 1.114
> diff -c -r1.114 pool_process_query.c
> *** pool_process_query.c	10 Nov 2008 00:58:24 -0000	1.114
> --- pool_process_query.c	11 Nov 2008 00:12:18 -0000
> ***************
> *** 376,381 ****
> --- 376,386 ----
>  				 (*InRecovery && pool_config->client_idle_limit_in_recovery >  
> 0)) && fds == 0)
>  			{
>  				frontend_idle_count++;
> +
> + 				pool_debug("idle count:%d InRecovery:%d client_idle_limit:%d  
> client_idle_limit_in_recovery:%d",
> + 						   frontend_idle_count, pool_config->client_idle_limit,
> + 						   pool_config->client_idle_limit_in_recovery);
> +
>  				if (*InRecovery == 0 && (frontend_idle_count > pool_config- 
> >client_idle_limit))
>  				{
>  					pool_log("pool_process_query: child connection forced to  
> terminate due to client_idle_limit(%d) reached", pool_config- 
> >client_idle_limit);