[Pgpool-general] client_ilde_limit_in_recovery

Wed Nov 12 01:17:59 UTC 2008

Thanks for valuabloe info and your effort trying find the cause of the
problem. I will look into this tonight.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Hi Tatsuo, thanks for the patch provided that is awesome.
> I applied the patch on the CVS version I checkout again. Then I  
> compiled pgpool and installed and all that went ok.
> Not sure if I mentioned this before but I'm using debian etch with  
> postgres 8.3.4
> 
> I tried to be as clear as I could but if you any questions please let  
> me know.
> 
> pgpool server = debian-db1
> 
> =====================
> Test #1: Applying patch
> =====================
> 
> Started pgpool on debug mode and then created a psql connection from  
> my box to the pgpool server.
> 
> psql -h debian-db1 -U postgres -d test
> test=#  \l
> ...
> test=#
> 
> Basically I logged in, listed the databases and then just left the  
> psql client connection sit there in idle.
> 
> Then I called the pcp_recovery_node command to bring node 2 back online.
> 
> debian-db1:/opt/pgpool-cvs/bin# ./pcp_recovery_node 10  localhost 9898  
> nastpcp nastpcp 2
> 
> First stage working fine but then got stuck on stage 2.
> 
> The new entry that was added so that we could see some debug info did  
> not show up in the logs at any point.
> 
> My first impression was that perhaps the "if statement" that is found  
> on line 375 of pool_process_query.c might be resulting to false and  
> therefore the
> code within that scope is not running.
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>    if (((*InRecovery == 0 && pool_config->client_idle_limit > 0) ||
>                                   (*InRecovery && pool_config- 
>  >client_idle_limit_in_recovery > 0)) && fds == 0)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> So I decided to added a couple more calls to pool_debug in the  
> pool_process_query.c  file and proceeded with the test below.
> 
> 
> ==========================================
> Test #2: Adding additional pool_debug calls
> ==========================================
> 
> debian-db1:/usr/local/src/CVS/pgpool-II-2.1.cvs/pgpool-II# diff  
> pool_process_query.c pool_process_query.c.orig
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 375,378d374
> <                         pool_debug(" outside: InRecovery:%d  
> client_idle_limit:%d client_idle_limit_in_recovery:%d",
> <                                    pool_config->client_idle_limit,  
> pool_config->client_idle_limit_in_recovery);
> <
> <
> 383,387d378
> <
> <                                       pool_debug(" loop1: idle count: 
> %d InRecovery:%d client_idle_limit:%d client_idle_limit_in_recovery:%d",
> <                                                     
> frontend_idle_count, pool_config->client_idle_limit,
> <                                                    pool_config- 
>  >client_idle_limit_in_recovery);
> <
> 396,399d386
> <                                      pool_debug(" loop2: idle count: 
> %d InRecovery:%d client_idle_limit:%d client_idle_limit_in_recovery:%d",
> <                                                    
> frontend_idle_count, pool_config->client_idle_limit,
> <                                                             
> pool_config->client_idle_limit_in_recovery);
> <
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> Then I compiled pgpool and installed it as before.
> 
> As before I started pgpool on debug mode and then created a psql  
> connection from my box to the pgpool server.
> 
> psql -h debian-db1 -U postgres -d test
> test=#  \l
> ...
> test=#
> 
> So, I logged in and then listed the databases.
> Once the "\l" command was executed the following showed up in the log:
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2008-11-11 18:18:59 DEBUG: pid 2490:  outside: InRecovery:0  
> client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
> 2008-11-11 18:18:59 DEBUG: pid 2490: read kind from frontend Q(51)
> 2008-11-11 18:18:59 DEBUG: pid 2490: statement2: SELECT d.datname as  
> "Name",
>         r.rolname as "Owner",
>         pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding"
> FROM pg_catalog.pg_database d
>    JOIN pg_catalog.pg_roles r ON d.datdba = r.oid
> ORDER BY 1;
> 2008-11-11 18:18:59 DEBUG: pid 2490: waiting for backend 1 completing  
> the query
> 2008-11-11 18:18:59 DEBUG: pid 2490:  outside: InRecovery:0  
> client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
> 2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend T NUM_BACKENDS: 2
> 2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
> backend: T
> 2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend D NUM_BACKENDS: 2
> 2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
> backend: D
> 2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend D NUM_BACKENDS: 2
> 2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
> backend: D
> 2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend D NUM_BACKENDS: 2
> 2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
> backend: D
> 2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend D NUM_BACKENDS: 2
> 2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
> backend: D
> 2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend C NUM_BACKENDS: 2
> 2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
> backend: C
> 2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend Z NUM_BACKENDS: 2
> 2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from  
> backend: Z
> 2008-11-11 18:18:59 DEBUG: pid 2490: pool_read_message_length: slot: 1  
> length: 5
> 2008-11-11 18:18:59 DEBUG: pid 2490: ReadyForQuery: message length: 5
> 2008-11-11 18:18:59 DEBUG: pid 2490: ReadyForQuery: transaction state: I
> 2008-11-11 18:18:59 DEBUG: pid 2490: end_load_balance: end load  
> balance mode
> 2008-11-11 18:19:10 DEBUG: pid 2421: starting health checking
> 2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 0 the DB node  
> status: 2
> 2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 1 the DB node  
> status: 2
> 2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 2 the DB node  
> status: 3
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> Then I just let the connection sit there idle without doing anything  
> on the psql console for a couple of minutes.
> Then I proceeded to performing a recovery as shown below
> 
> Called the pcp_recovery_node command to bring node 2 back online.
> debian-db1:/opt/pgpool-cvs/bin# ./pcp_recovery_node 10  localhost 9898  
> nastpcp nastpcp 2
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2008-11-11 18:19:10 DEBUG: pid 2421: starting health checking
> 2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 0 the DB node  
> status: 2
> 2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 1 the DB node  
> status: 2
> 2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 2 the DB node  
> status: 3
> 2008-11-11 18:19:50 DEBUG: pid 2421: starting health checking
> 2008-11-11 18:19:50 DEBUG: pid 2421: health_check: 0 the DB node  
> status: 2
> 2008-11-11 18:19:50 DEBUG: pid 2421: health_check: 1 the DB node  
> status: 2
> 2008-11-11 18:19:50 DEBUG: pid 2421: health_check: 2 the DB node  
> status: 3
> 2008-11-11 18:20:30 DEBUG: pid 2421: starting health checking
> 2008-11-11 18:20:30 DEBUG: pid 2421: health_check: 0 the DB node  
> status: 2
> 2008-11-11 18:20:30 DEBUG: pid 2421: health_check: 1 the DB node  
> status: 2
> 2008-11-11 18:20:31 DEBUG: pid 2421: health_check: 2 the DB node  
> status: 3
> 2008-11-11 18:21:11 DEBUG: pid 2421: starting health checking
> 2008-11-11 18:21:11 DEBUG: pid 2421: health_check: 0 the DB node  
> status: 2
> 2008-11-11 18:21:11 DEBUG: pid 2421: health_check: 1 the DB node  
> status: 2
> 2008-11-11 18:21:11 DEBUG: pid 2421: health_check: 2 the DB node  
> status: 3
> 2008-11-11 18:21:27 DEBUG: pid 2454: I am PCP 2454 accept fd 5
> 2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: received PCP packet  
> type of service 'M'
> 2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: salt sent to the client
> 2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: received PCP packet  
> type of service 'R'
> 2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: authentication OK
> 2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: received PCP packet  
> type of service 'O'
> 2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: start online recovery
> 2008-11-11 18:21:27 LOG:   pid 2454: starting recovering node 2
> 2008-11-11 18:21:27 DEBUG: pid 2454: exec_checkpoint: start checkpoint
> 2008-11-11 18:21:27 DEBUG: pid 2454: exec_checkpoint: finish checkpoint
> 2008-11-11 18:21:27 LOG:   pid 2454: CHECKPOINT in the 1st stage done
> 2008-11-11 18:21:27 LOG:   pid 2454: starting recovery command:  
> "SELECT pgpool_recovery('copy-base-backup', 'debian-db4', '/var/lib/ 
> postgresql/8.3/main')"
> 2008-11-11 18:21:27 DEBUG: pid 2454: exec_recovery: start recovery
> 2008-11-11 18:21:29 DEBUG: pid 2454: exec_recovery: finish recovery
> 2008-11-11 18:21:29 LOG:   pid 2454: 1st stage is done
> 2008-11-11 18:21:29 LOG:   pid 2454: starting 2nd stage
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> So, stage 2 just hang there for a few seconds not doing anything.
> 
> Then I decided to list the databases again with "\l" on the psql  
> connection that I already had opened on the beginning of the test.
> Once that happened it seems like pgpool triggered the client  
> connection to be reset and then 2nd stage proceeded.
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2008-11-11 18:21:51 DEBUG: pid 2421: starting health checking
> 2008-11-11 18:22:07 DEBUG: pid 2490:  outside: InRecovery:0  
> client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
> 2008-11-11 18:22:07 DEBUG: pid 2490: read kind from frontend Q(51)
> 2008-11-11 18:22:07 DEBUG: pid 2490: statement2: SELECT d.datname as  
> "Name",
>         r.rolname as "Owner",
>         pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding"
> FROM pg_catalog.pg_database d
>    JOIN pg_catalog.pg_roles r ON d.datdba = r.oid
> ORDER BY 1;
> 2008-11-11 18:22:07 DEBUG: pid 2490: waiting for backend 1 completing  
> the query
> 2008-11-11 18:22:07 DEBUG: pid 2490:  outside: InRecovery:0  
> client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
> 2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend T NUM_BACKENDS: 2
> 2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
> backend: T
> 2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend D NUM_BACKENDS: 2
> 2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
> backend: D
> 2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend D NUM_BACKENDS: 2
> 2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
> backend: D
> 2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend D NUM_BACKENDS: 2
> 2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
> backend: D
> 2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend D NUM_BACKENDS: 2
> 2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
> backend: D
> 2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend C NUM_BACKENDS: 2
> 2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
> backend: C
> 2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend Z NUM_BACKENDS: 2
> 2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from  
> backend: Z
> 2008-11-11 18:22:07 DEBUG: pid 2490: pool_read_message_length: slot: 1  
> length: 5
> 2008-11-11 18:22:07 DEBUG: pid 2490: ReadyForQuery: message length: 5
> 2008-11-11 18:22:07 DEBUG: pid 2490: ReadyForQuery: transaction state: I
> 2008-11-11 18:22:07 DEBUG: pid 2490: end_load_balance: end load  
> balance mode
> 2008-11-11 18:22:08 DEBUG: pid 2490:  outside: InRecovery:0  
> client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
> 2008-11-11 18:22:08 DEBUG: pid 2490:  loop1: idle count:1 InRecovery:0  
> client_idle_limit:1 client_idle_limit_in_recovery:-1080004248
> 2008-11-11 18:22:09 DEBUG: pid 2490:  outside: InRecovery:0  
> client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
> 2008-11-11 18:22:09 DEBUG: pid 2490:  loop1: idle count:2 InRecovery:0  
> client_idle_limit:1 client_idle_limit_in_recovery:-1080004248
> 2008-11-11 18:22:09 DEBUG: pid 2490:  loop2: idle count:2 InRecovery:0  
> client_idle_limit:1 client_idle_limit_in_recovery:-1080004248
> 2008-11-11 18:22:09 LOG:   pid 2490: pool_process_query: child  
> connection forced to terminate due to client_idle_limit_in_recovery(1)  
> reached
> 2008-11-11 18:22:09 DEBUG: pid 2490: statement2:  RESET ALL
> 2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 0 completing  
> the query
> 2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 1 completing  
> the query
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 0 th backend C NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend C NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
> backend: C
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 0 th backend Z NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend Z NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
> backend: Z
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 0  
> length: 5
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 1  
> length: 5
> 2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: message length: 5
> 2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: transaction state: I
> 2008-11-11 18:22:09 DEBUG: pid 2490: statement2:  SET SESSION  
> AUTHORIZATION DEFAULT
> 2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 0 completing  
> the query
> 2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 1 completing  
> the query
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 0 th backend S NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend S NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
> backend: S
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master  
> slot: 0 length: 20
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master  
> slot: 1 length: 20
> 2008-11-11 18:22:09 DEBUG: pid 2490: 0 th backend: name: is_superuser  
> value: on
> 2008-11-11 18:22:09 DEBUG: pid 2490: 1 th backend: name: is_superuser  
> value: on
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 0 th backend S NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend S NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
> backend: S
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master  
> slot: 0 length: 35
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master  
> slot: 1 length: 35
> 2008-11-11 18:22:09 DEBUG: pid 2490: 0 th backend: name:  
> session_authorization value: postgres
> 2008-11-11 18:22:09 DEBUG: pid 2490: 1 th backend: name:  
> session_authorization value: postgres
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 0 th backend C NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend C NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
> backend: C
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 0 th backend Z NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind  
> from 1 th backend Z NUM_BACKENDS: 3
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from  
> backend: Z
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 0  
> length: 5
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 1  
> length: 5
> 2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: message length: 5
> 2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: transaction state: I
> 2008-11-11 18:22:09 DEBUG: pid 2490: pool_connection_pool_timer: set  
> close time 1226449329
> 2008-11-11 18:22:11 LOG:   pid 2454: all connections from clients have  
> been closed
> 2008-11-11 18:22:11 DEBUG: pid 2454: exec_checkpoint: start checkpoint
> 2008-11-11 18:22:11 DEBUG: pid 2454: exec_checkpoint: finish checkpoint
> 2008-11-11 18:22:11 LOG:   pid 2454: CHECKPOINT in the 2nd stage done
> 2008-11-11 18:22:11 LOG:   pid 2454: starting recovery command:  
> "SELECT pgpool_recovery('pgpool_recovery_pitr', 'debian-db4', '/var/ 
> lib/postgresql/8.3/main')"
> 2008-11-11 18:22:11 DEBUG: pid 2454: exec_recovery: start recovery
> 2008-11-11 18:22:11 DEBUG: pid 2454: exec_recovery: finish recovery
> 2008-11-11 18:22:11 DEBUG: pid 2454: exec_remote_start: start  
> pgpool_remote_start
> 2008-11-11 18:22:16 DEBUG: pid 2454: exec_remote_start: finish  
> pgpool_remote_start
> 2008-11-11 18:22:19 LOG:   pid 2454: 2 node restarted
> 2008-11-11 18:22:19 LOG:   pid 2454: send_failback_request: fail back  
> 2 th node request from pid 2454
> 2008-11-11 18:22:19 LOG:   pid 2454: recovery done
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler called
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: starting to  
> select new master node
> 2008-11-11 18:22:19 LOG:   pid 2421: starting fail back. reconnect  
> host debian-db4(5432)
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2459
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2460
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2461
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2462
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2463
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2464
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2465
> 2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2466
> .....
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> If I had not done so the "pcp_recovery_node " command would have  
> failed giving back the BackendError message and the log would show
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2008-11-11 18:31:09 LOG:   pid 2454: starting 2nd stage
> 2008-11-11 18:31:30 DEBUG: pid 2421: starting health checking
> 2008-11-11 18:32:10 DEBUG: pid 2421: starting health checking
> 2008-11-11 18:32:43 ERROR: pid 2454: wait_connection_closed: existing  
> connections did not close in 90 sec.
> 2008-11-11 18:32:43 ERROR: pid 2454: start_recovery: timeover for  
> waiting connection closed
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> It just seems weird to me that pgpool only kills the client connection  
> and proceeds with stage 2  if I execute a command within the psql  
> console that I
> had opened before the online recovery started. If I had  
> client_idle_limit set to 30s the connection would have been killed  
> without me having to do anything as far
> as I can remember.
> 
> 
> PGPOOL.CONF:
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> listen_addresses = '*'
> port = 5432
> pcp_port = 9898
> socket_dir = '/var/run/postgresql'
> pcp_socket_dir = '/var/run/postgresql'
> backend_socket_dir = '/var/run/postgresql'
> pcp_timeout = 10
> num_init_children = 32
> max_pool = 1
> child_life_time = 150
> connection_life_time = 0
> child_max_connections = 0
> client_idle_limit = 0
> authentication_timeout = 60
> logdir = '/var/run/postgresql'
> replication_mode = true
> load_balance_mode = true
> replication_stop_on_mismatch = false
> replicate_select = false
> reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
> print_timestamp = true
> master_slave_mode = false
> connection_cache = true
> health_check_timeout = 20
> health_check_period = 40
> health_check_user = 'postgres'
> failover_command = ''
> failback_command = ''
> insert_lock = false
> ignore_leading_white_space = true
> log_statement = false
> log_connections = false
> log_hostname = false
> parallel_mode = false
> enable_query_cache = false
> pgpool2_hostname = 'debian-db1'
> backend_hostname0 = 'debian-db2'
> backend_port0 = 5432
> backend_weight0 = 1
> backend_data_directory0 = '/var/lib/postgresql/8.3/main'
> backend_hostname1 = 'debian-db3'
> backend_port1 = 5432
> backend_weight1 = 1
> backend_data_directory1 = '/var/lib/postgresql/8.3/main'
> backend_hostname2 = 'debian-db4'
> backend_port2 = 5432
> backend_weight2 = 1
> backend_data_directory2 = '/var/lib/postgresql/8.3/main'
> enable_pool_hba = false
> recovery_user = 'postgres'
> recovery_password = ''
> recovery_1st_stage_command = 'copy-base-backup'
> recovery_2nd_stage_command = 'pgpool_recovery_pitr'
> recovery_timeout = 90
> client_idle_limit_in_recovery = 1
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> Thanks for all your help on this
> -
> Marcelo
> 
> On Nov 10, 2008, at 6:13 PM, Tatsuo Ishii wrote:
> 
> > Marcelo,
> >
> > Could you apply follwing patches and try again so that we could more
> > debug info?
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> >> hmm unfortunately it does not seem to be working for me or perhaps I
> >> have a setting that you may not have turned on Tiago.
> >>
> >> Basically I created one connection from my apache (10.1.101.152)  
> >> going
> >> to pgpool and then let it set idle.
> >> pgpool has two connections open to the backends 0 and 1 and I'm  
> >> trying
> >> to recover node 2.
> >>
> >> It pretty much hangs on staging 2. On pgpool start up I do see that
> >> the new setting is indeed turned on.
> >> What I think it's strange is that on the pgpool LOG I don't even see
> >> it trying to kill the clients by checking the
> >> client_idle_limit_in_recovery setting
> >>
> >>
> >> Could one of these settings be the problem ?
> >>
> >> num_init_children = 70
> >> max_pool = 1
> >> child_life_time = 300
> >> connection_life_time = 300
> >> child_max_connections = 0
> >> client_idle_limit = 0
> >> authentication_timeout = 60
> >> replication_mode = true
> >> load_balance_mode = true
> >> replication_stop_on_mismatch = false
> >> connection_cache = true
> >> health_check_timeout = 20
> >> health_check_period = 15
> >> health_check_user = 'postgres'
> >> recovery_user = 'postgres'
> >> recovery_password = ''
> >> recovery_1st_stage_command = 'copy-base-backup'
> >> recovery_2nd_stage_command = 'pgpool_recovery_pitr'
> >> recovery_timeout = 90
> >> client_idle_limit_in_recovery = 10
> >>
> >>
> >>
> >>
> >> LOG:
> >> -----
> >> Nov 10 12:29:59 debian-db1 pgpool: 2008-11-10 12:29:59 LOG:   pid
> >> 30668: pgpool successfully started
> >> Nov 10 12:29:59 debian-db1 pgpool: 2008-11-10 12:29:59 LOG:   pid
> >> 30668: set 2 th backend down status
> >> Nov 10 12:29:59 debian-db1 pgpool: 2008-11-10 12:29:59 LOG:   pid
> >> 30668: starting degeneration. shutdown host debian-db4(5432)
> >> Nov 10 12:30:00 debian-db1 pgpool: 2008-11-10 12:30:00 LOG:   pid
> >> 30668: failover_handler: set new master node: 0
> >> Nov 10 12:30:01 debian-db1 pgpool: 2008-11-10 12:30:01 LOG:   pid
> >> 30668: failover done. shutdown host debian-db4(5432)
> >> Nov 10 12:30:02 debian-db1 pgpool: 2008-11-10 12:30:02 LOG:   pid
> >> 30821: ProcessFrontendResponse: failed to read kind from frontend.
> >> frontend abnormally exited
> >> Nov 10 12:34:07 debian-db1 pgpool: 2008-11-10 12:34:07 LOG:   pid
> >> 30741: starting recovering node 2
> >> Nov 10 12:34:08 debian-db1 pgpool: 2008-11-10 12:34:08 LOG:   pid
> >> 30741: CHECKPOINT in the 1st stage done
> >> Nov 10 12:34:08 debian-db1 pgpool: 2008-11-10 12:34:08 LOG:   pid
> >> 30741: starting recovery command: "SELECT pgpool_recovery('copy-base-
> >> backup', 'debian-db4', '/var/lib/postgresql/8.3/main')"
> >> Nov 10 12:34:09 debian-db1 pgpool: 2008-11-10 12:34:09 LOG:   pid
> >> 30741: 1st stage is done
> >> Nov 10 12:34:09 debian-db1 pgpool: 2008-11-10 12:34:09 LOG:   pid
> >> 30741: starting 2nd stage
> >> Nov 10 12:34:17 debian-db1 pgpool: 2008-11-10 12:34:17 DEBUG: pid
> >> 30668: starting health checking
> >> Nov 10 12:34:32 debian-db1 pgpool: 2008-11-10 12:34:32 DEBUG: pid
> >> 30668: starting health checking
> >> Nov 10 12:34:47 debian-db1 pgpool: 2008-11-10 12:34:47 DEBUG: pid
> >> 30668: starting health checking
> >> Nov 10 12:35:02 debian-db1 pgpool: 2008-11-10 12:35:02 DEBUG: pid
> >> 30668: starting health checking
> >> Nov 10 12:35:17 debian-db1 pgpool: 2008-11-10 12:35:17 DEBUG: pid
> >> 30668: starting health checking
> >> Nov 10 12:35:32 debian-db1 pgpool: 2008-11-10 12:35:32 DEBUG: pid
> >> 30668: starting health checking
> >> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 ERROR: pid
> >> 30741: wait_connection_closed: existing connections did not close in
> >> 90 sec.
> >> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 ERROR: pid
> >> 30741: start_recovery: timeover for waiting connection closed
> >> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 DEBUG: pid
> >> 30741: pcp_child: received PCP packet type of service 'X'
> >> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 DEBUG: pid
> >> 30741: pcp_child: client disconnecting. close connection
> >>
> >>
> >>
> >>
> >> NETSTAT
> >> -------------
> >> tcp        0      0 0.0.0.0:5432            0.0.0.0:*
> >> LISTEN     30668/pgpool
> >> tcp        0      0 10.1.100.213:5432       10.1.101.152:2936
> >> ESTABLISHED30821/pgpool:
> >> tcp        0      0 10.1.100.213:1947       10.1.100.217:5432
> >> ESTABLISHED30821/pgpool:
> >> tcp        1      0 127.0.0.1:4652          127.0.0.1:5432
> >> CLOSE_WAIT 1973/python
> >> tcp        0      0 10.1.100.213:4814       10.1.100.116:5432
> >> ESTABLISHED30821/pgpool:
> >>
> >>
> >>
> >>
> >> Thank you,
> >> Marcelo
> >>
> >>
> >> On Nov 8, 2008, at 9:05 AM, Tiago Macedo wrote:
> >>
> >>> Hi,
> >>>
> >>> I've tested this and it works perfectly. Exactly what I needed.
> >>>
> >>> Thank you so much for your hard work,
> >>> Tiago Macedo
> >>>
> >>> On Fri, Nov 7, 2008 at 9:28 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
> >>> wrote:
> >>> Hi,
> >>>
> >>> I have implemented new directive "client_ilde_limit_in_recovery" per
> >>> discussion. This is usefull for on line recovery. From the doc:
> >>>
> >>>   client_ilde_limit_in_recovery
> >>>
> >>>          Similar to client_idle_limit but only takes effect in
> >>> recovery 2nd
> >>>          stage. Disconnect the connection to a client being idle for
> >>>          client_idle_limit_in_recovery seconds since the last query
> >>> has
> >>>          been sent.  This is usefull for preventing for pgpool
> >>> recovery
> >>>          disturbed by a lazy client or TCP/IP connection between
> >>> client and
> >>>          pgpool is accidentally down. The default value for
> >>>          client_idle_limit_in_recovery is 0, which means the
> >>> functionality is turned
> >>>          off. You need to reload pgpool.conf if you change
> >>>          client_idle_limit_in_recovery.
> >>>
> >>> Note that now client_idle_limit only takes effect *other than* in  
> >>> the
> >>> second stage of recovery. I think this makes more sense. Please  
> >>> let me
> >>> know if you have any questions/comments.
> >>>
> >>> Thanks,
> >>> --
> >>> Tatsuo Ishii
> >>> SRA OSS, Inc. Japan
> >>> _______________________________________________
> >>> Pgpool-general mailing list
> >>> Pgpool-general at pgfoundry.org
> >>> http://pgfoundry.org/mailman/listinfo/pgpool-general
> >>>
> >>> _______________________________________________
> >>> Pgpool-general mailing list
> >>> Pgpool-general at pgfoundry.org
> >>> http://pgfoundry.org/mailman/listinfo/pgpool-general
> >>
> > Index: pool_process_query.c
> > ===================================================================
> > RCS file: /cvsroot/pgpool/pgpool-II/pool_process_query.c,v
> > retrieving revision 1.114
> > diff -c -r1.114 pool_process_query.c
> > *** pool_process_query.c	10 Nov 2008 00:58:24 -0000	1.114
> > --- pool_process_query.c	11 Nov 2008 00:12:18 -0000
> > ***************
> > *** 376,381 ****
> > --- 376,386 ----
> >  				 (*InRecovery && pool_config->client_idle_limit_in_recovery >  
> > 0)) && fds == 0)
> >  			{
> >  				frontend_idle_count++;
> > +
> > + 				pool_debug("idle count:%d InRecovery:%d client_idle_limit:%d  
> > client_idle_limit_in_recovery:%d",
> > + 						   frontend_idle_count, pool_config->client_idle_limit,
> > + 						   pool_config->client_idle_limit_in_recovery);
> > +
> >  				if (*InRecovery == 0 && (frontend_idle_count > pool_config- 
> > >client_idle_limit))
> >  				{
> >  					pool_log("pool_process_query: child connection forced to  
> > terminate due to client_idle_limit(%d) reached", pool_config- 
> > >client_idle_limit);
>