[Pgpool-general] client_ilde_limit_in_recovery
Marcelo Martins
pglists at zeroaccess.org
Wed Nov 12 00:39:26 UTC 2008
Hi Tatsuo, thanks for the patch provided that is awesome.
I applied the patch on the CVS version I checkout again. Then I
compiled pgpool and installed and all that went ok.
Not sure if I mentioned this before but I'm using debian etch with
postgres 8.3.4
I tried to be as clear as I could but if you any questions please let
me know.
pgpool server = debian-db1
=====================
Test #1: Applying patch
=====================
Started pgpool on debug mode and then created a psql connection from
my box to the pgpool server.
psql -h debian-db1 -U postgres -d test
test=# \l
...
test=#
Basically I logged in, listed the databases and then just left the
psql client connection sit there in idle.
Then I called the pcp_recovery_node command to bring node 2 back online.
debian-db1:/opt/pgpool-cvs/bin# ./pcp_recovery_node 10 localhost 9898
nastpcp nastpcp 2
First stage working fine but then got stuck on stage 2.
The new entry that was added so that we could see some debug info did
not show up in the logs at any point.
My first impression was that perhaps the "if statement" that is found
on line 375 of pool_process_query.c might be resulting to false and
therefore the
code within that scope is not running.
--------------------------------------------------------------------------------------------------------------------------------------------------------------
if (((*InRecovery == 0 && pool_config->client_idle_limit > 0) ||
(*InRecovery && pool_config-
>client_idle_limit_in_recovery > 0)) && fds == 0)
--------------------------------------------------------------------------------------------------------------------------------------------------------------
So I decided to added a couple more calls to pool_debug in the
pool_process_query.c file and proceeded with the test below.
==========================================
Test #2: Adding additional pool_debug calls
==========================================
debian-db1:/usr/local/src/CVS/pgpool-II-2.1.cvs/pgpool-II# diff
pool_process_query.c pool_process_query.c.orig
--------------------------------------------------------------------------------------------------------------------------------------------------------------
375,378d374
< pool_debug(" outside: InRecovery:%d
client_idle_limit:%d client_idle_limit_in_recovery:%d",
< pool_config->client_idle_limit,
pool_config->client_idle_limit_in_recovery);
<
<
383,387d378
<
< pool_debug(" loop1: idle count:
%d InRecovery:%d client_idle_limit:%d client_idle_limit_in_recovery:%d",
<
frontend_idle_count, pool_config->client_idle_limit,
< pool_config-
>client_idle_limit_in_recovery);
<
396,399d386
< pool_debug(" loop2: idle count:
%d InRecovery:%d client_idle_limit:%d client_idle_limit_in_recovery:%d",
<
frontend_idle_count, pool_config->client_idle_limit,
<
pool_config->client_idle_limit_in_recovery);
<
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Then I compiled pgpool and installed it as before.
As before I started pgpool on debug mode and then created a psql
connection from my box to the pgpool server.
psql -h debian-db1 -U postgres -d test
test=# \l
...
test=#
So, I logged in and then listed the databases.
Once the "\l" command was executed the following showed up in the log:
--------------------------------------------------------------------------------------------------------------------------------------------------------------
2008-11-11 18:18:59 DEBUG: pid 2490: outside: InRecovery:0
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:18:59 DEBUG: pid 2490: read kind from frontend Q(51)
2008-11-11 18:18:59 DEBUG: pid 2490: statement2: SELECT d.datname as
"Name",
r.rolname as "Owner",
pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding"
FROM pg_catalog.pg_database d
JOIN pg_catalog.pg_roles r ON d.datdba = r.oid
ORDER BY 1;
2008-11-11 18:18:59 DEBUG: pid 2490: waiting for backend 1 completing
the query
2008-11-11 18:18:59 DEBUG: pid 2490: outside: InRecovery:0
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend T NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from
backend: T
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from
backend: D
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from
backend: D
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from
backend: D
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from
backend: D
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend C NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from
backend: C
2008-11-11 18:18:59 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend Z NUM_BACKENDS: 2
2008-11-11 18:18:59 DEBUG: pid 2490: pool_process_query: kind from
backend: Z
2008-11-11 18:18:59 DEBUG: pid 2490: pool_read_message_length: slot: 1
length: 5
2008-11-11 18:18:59 DEBUG: pid 2490: ReadyForQuery: message length: 5
2008-11-11 18:18:59 DEBUG: pid 2490: ReadyForQuery: transaction state: I
2008-11-11 18:18:59 DEBUG: pid 2490: end_load_balance: end load
balance mode
2008-11-11 18:19:10 DEBUG: pid 2421: starting health checking
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 0 the DB node
status: 2
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 1 the DB node
status: 2
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 2 the DB node
status: 3
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Then I just let the connection sit there idle without doing anything
on the psql console for a couple of minutes.
Then I proceeded to performing a recovery as shown below
Called the pcp_recovery_node command to bring node 2 back online.
debian-db1:/opt/pgpool-cvs/bin# ./pcp_recovery_node 10 localhost 9898
nastpcp nastpcp 2
--------------------------------------------------------------------------------------------------------------------------------------------------------------
2008-11-11 18:19:10 DEBUG: pid 2421: starting health checking
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 0 the DB node
status: 2
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 1 the DB node
status: 2
2008-11-11 18:19:10 DEBUG: pid 2421: health_check: 2 the DB node
status: 3
2008-11-11 18:19:50 DEBUG: pid 2421: starting health checking
2008-11-11 18:19:50 DEBUG: pid 2421: health_check: 0 the DB node
status: 2
2008-11-11 18:19:50 DEBUG: pid 2421: health_check: 1 the DB node
status: 2
2008-11-11 18:19:50 DEBUG: pid 2421: health_check: 2 the DB node
status: 3
2008-11-11 18:20:30 DEBUG: pid 2421: starting health checking
2008-11-11 18:20:30 DEBUG: pid 2421: health_check: 0 the DB node
status: 2
2008-11-11 18:20:30 DEBUG: pid 2421: health_check: 1 the DB node
status: 2
2008-11-11 18:20:31 DEBUG: pid 2421: health_check: 2 the DB node
status: 3
2008-11-11 18:21:11 DEBUG: pid 2421: starting health checking
2008-11-11 18:21:11 DEBUG: pid 2421: health_check: 0 the DB node
status: 2
2008-11-11 18:21:11 DEBUG: pid 2421: health_check: 1 the DB node
status: 2
2008-11-11 18:21:11 DEBUG: pid 2421: health_check: 2 the DB node
status: 3
2008-11-11 18:21:27 DEBUG: pid 2454: I am PCP 2454 accept fd 5
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: received PCP packet
type of service 'M'
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: salt sent to the client
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: received PCP packet
type of service 'R'
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: authentication OK
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: received PCP packet
type of service 'O'
2008-11-11 18:21:27 DEBUG: pid 2454: pcp_child: start online recovery
2008-11-11 18:21:27 LOG: pid 2454: starting recovering node 2
2008-11-11 18:21:27 DEBUG: pid 2454: exec_checkpoint: start checkpoint
2008-11-11 18:21:27 DEBUG: pid 2454: exec_checkpoint: finish checkpoint
2008-11-11 18:21:27 LOG: pid 2454: CHECKPOINT in the 1st stage done
2008-11-11 18:21:27 LOG: pid 2454: starting recovery command:
"SELECT pgpool_recovery('copy-base-backup', 'debian-db4', '/var/lib/
postgresql/8.3/main')"
2008-11-11 18:21:27 DEBUG: pid 2454: exec_recovery: start recovery
2008-11-11 18:21:29 DEBUG: pid 2454: exec_recovery: finish recovery
2008-11-11 18:21:29 LOG: pid 2454: 1st stage is done
2008-11-11 18:21:29 LOG: pid 2454: starting 2nd stage
--------------------------------------------------------------------------------------------------------------------------------------------------------------
So, stage 2 just hang there for a few seconds not doing anything.
Then I decided to list the databases again with "\l" on the psql
connection that I already had opened on the beginning of the test.
Once that happened it seems like pgpool triggered the client
connection to be reset and then 2nd stage proceeded.
--------------------------------------------------------------------------------------------------------------------------------------------------------------
2008-11-11 18:21:51 DEBUG: pid 2421: starting health checking
2008-11-11 18:22:07 DEBUG: pid 2490: outside: InRecovery:0
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:22:07 DEBUG: pid 2490: read kind from frontend Q(51)
2008-11-11 18:22:07 DEBUG: pid 2490: statement2: SELECT d.datname as
"Name",
r.rolname as "Owner",
pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding"
FROM pg_catalog.pg_database d
JOIN pg_catalog.pg_roles r ON d.datdba = r.oid
ORDER BY 1;
2008-11-11 18:22:07 DEBUG: pid 2490: waiting for backend 1 completing
the query
2008-11-11 18:22:07 DEBUG: pid 2490: outside: InRecovery:0
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend T NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from
backend: T
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from
backend: D
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from
backend: D
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from
backend: D
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend D NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from
backend: D
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend C NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from
backend: C
2008-11-11 18:22:07 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend Z NUM_BACKENDS: 2
2008-11-11 18:22:07 DEBUG: pid 2490: pool_process_query: kind from
backend: Z
2008-11-11 18:22:07 DEBUG: pid 2490: pool_read_message_length: slot: 1
length: 5
2008-11-11 18:22:07 DEBUG: pid 2490: ReadyForQuery: message length: 5
2008-11-11 18:22:07 DEBUG: pid 2490: ReadyForQuery: transaction state: I
2008-11-11 18:22:07 DEBUG: pid 2490: end_load_balance: end load
balance mode
2008-11-11 18:22:08 DEBUG: pid 2490: outside: InRecovery:0
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:22:08 DEBUG: pid 2490: loop1: idle count:1 InRecovery:0
client_idle_limit:1 client_idle_limit_in_recovery:-1080004248
2008-11-11 18:22:09 DEBUG: pid 2490: outside: InRecovery:0
client_idle_limit:1 client_idle_limit_in_recovery:-1080004632
2008-11-11 18:22:09 DEBUG: pid 2490: loop1: idle count:2 InRecovery:0
client_idle_limit:1 client_idle_limit_in_recovery:-1080004248
2008-11-11 18:22:09 DEBUG: pid 2490: loop2: idle count:2 InRecovery:0
client_idle_limit:1 client_idle_limit_in_recovery:-1080004248
2008-11-11 18:22:09 LOG: pid 2490: pool_process_query: child
connection forced to terminate due to client_idle_limit_in_recovery(1)
reached
2008-11-11 18:22:09 DEBUG: pid 2490: statement2: RESET ALL
2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 0 completing
the query
2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 1 completing
the query
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 0 th backend C NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend C NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from
backend: C
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 0 th backend Z NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend Z NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from
backend: Z
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 0
length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 1
length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: message length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: transaction state: I
2008-11-11 18:22:09 DEBUG: pid 2490: statement2: SET SESSION
AUTHORIZATION DEFAULT
2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 0 completing
the query
2008-11-11 18:22:09 DEBUG: pid 2490: waiting for backend 1 completing
the query
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 0 th backend S NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend S NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from
backend: S
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master
slot: 0 length: 20
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master
slot: 1 length: 20
2008-11-11 18:22:09 DEBUG: pid 2490: 0 th backend: name: is_superuser
value: on
2008-11-11 18:22:09 DEBUG: pid 2490: 1 th backend: name: is_superuser
value: on
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 0 th backend S NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend S NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from
backend: S
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master
slot: 0 length: 35
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length2: master
slot: 1 length: 35
2008-11-11 18:22:09 DEBUG: pid 2490: 0 th backend: name:
session_authorization value: postgres
2008-11-11 18:22:09 DEBUG: pid 2490: 1 th backend: name:
session_authorization value: postgres
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 0 th backend C NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend C NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from
backend: C
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 0 th backend Z NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: read_kind_from_backend: read kind
from 1 th backend Z NUM_BACKENDS: 3
2008-11-11 18:22:09 DEBUG: pid 2490: pool_process_query: kind from
backend: Z
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 0
length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: pool_read_message_length: slot: 1
length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: message length: 5
2008-11-11 18:22:09 DEBUG: pid 2490: ReadyForQuery: transaction state: I
2008-11-11 18:22:09 DEBUG: pid 2490: pool_connection_pool_timer: set
close time 1226449329
2008-11-11 18:22:11 LOG: pid 2454: all connections from clients have
been closed
2008-11-11 18:22:11 DEBUG: pid 2454: exec_checkpoint: start checkpoint
2008-11-11 18:22:11 DEBUG: pid 2454: exec_checkpoint: finish checkpoint
2008-11-11 18:22:11 LOG: pid 2454: CHECKPOINT in the 2nd stage done
2008-11-11 18:22:11 LOG: pid 2454: starting recovery command:
"SELECT pgpool_recovery('pgpool_recovery_pitr', 'debian-db4', '/var/
lib/postgresql/8.3/main')"
2008-11-11 18:22:11 DEBUG: pid 2454: exec_recovery: start recovery
2008-11-11 18:22:11 DEBUG: pid 2454: exec_recovery: finish recovery
2008-11-11 18:22:11 DEBUG: pid 2454: exec_remote_start: start
pgpool_remote_start
2008-11-11 18:22:16 DEBUG: pid 2454: exec_remote_start: finish
pgpool_remote_start
2008-11-11 18:22:19 LOG: pid 2454: 2 node restarted
2008-11-11 18:22:19 LOG: pid 2454: send_failback_request: fail back
2 th node request from pid 2454
2008-11-11 18:22:19 LOG: pid 2454: recovery done
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler called
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: starting to
select new master node
2008-11-11 18:22:19 LOG: pid 2421: starting fail back. reconnect
host debian-db4(5432)
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2459
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2460
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2461
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2462
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2463
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2464
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2465
2008-11-11 18:22:19 DEBUG: pid 2421: failover_handler: kill 2466
.....
--------------------------------------------------------------------------------------------------------------------------------------------------------------
If I had not done so the "pcp_recovery_node " command would have
failed giving back the BackendError message and the log would show
--------------------------------------------------------------------------------------------------------------------------------------------------------------
2008-11-11 18:31:09 LOG: pid 2454: starting 2nd stage
2008-11-11 18:31:30 DEBUG: pid 2421: starting health checking
2008-11-11 18:32:10 DEBUG: pid 2421: starting health checking
2008-11-11 18:32:43 ERROR: pid 2454: wait_connection_closed: existing
connections did not close in 90 sec.
2008-11-11 18:32:43 ERROR: pid 2454: start_recovery: timeover for
waiting connection closed
--------------------------------------------------------------------------------------------------------------------------------------------------------------
It just seems weird to me that pgpool only kills the client connection
and proceeds with stage 2 if I execute a command within the psql
console that I
had opened before the online recovery started. If I had
client_idle_limit set to 30s the connection would have been killed
without me having to do anything as far
as I can remember.
PGPOOL.CONF:
--------------------------------------------------------------------------------------------------------------------------------------------------------------
listen_addresses = '*'
port = 5432
pcp_port = 9898
socket_dir = '/var/run/postgresql'
pcp_socket_dir = '/var/run/postgresql'
backend_socket_dir = '/var/run/postgresql'
pcp_timeout = 10
num_init_children = 32
max_pool = 1
child_life_time = 150
connection_life_time = 0
child_max_connections = 0
client_idle_limit = 0
authentication_timeout = 60
logdir = '/var/run/postgresql'
replication_mode = true
load_balance_mode = true
replication_stop_on_mismatch = false
replicate_select = false
reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
print_timestamp = true
master_slave_mode = false
connection_cache = true
health_check_timeout = 20
health_check_period = 40
health_check_user = 'postgres'
failover_command = ''
failback_command = ''
insert_lock = false
ignore_leading_white_space = true
log_statement = false
log_connections = false
log_hostname = false
parallel_mode = false
enable_query_cache = false
pgpool2_hostname = 'debian-db1'
backend_hostname0 = 'debian-db2'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/8.3/main'
backend_hostname1 = 'debian-db3'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/8.3/main'
backend_hostname2 = 'debian-db4'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/lib/postgresql/8.3/main'
enable_pool_hba = false
recovery_user = 'postgres'
recovery_password = ''
recovery_1st_stage_command = 'copy-base-backup'
recovery_2nd_stage_command = 'pgpool_recovery_pitr'
recovery_timeout = 90
client_idle_limit_in_recovery = 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Thanks for all your help on this
-
Marcelo
On Nov 10, 2008, at 6:13 PM, Tatsuo Ishii wrote:
> Marcelo,
>
> Could you apply follwing patches and try again so that we could more
> debug info?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
>> hmm unfortunately it does not seem to be working for me or perhaps I
>> have a setting that you may not have turned on Tiago.
>>
>> Basically I created one connection from my apache (10.1.101.152)
>> going
>> to pgpool and then let it set idle.
>> pgpool has two connections open to the backends 0 and 1 and I'm
>> trying
>> to recover node 2.
>>
>> It pretty much hangs on staging 2. On pgpool start up I do see that
>> the new setting is indeed turned on.
>> What I think it's strange is that on the pgpool LOG I don't even see
>> it trying to kill the clients by checking the
>> client_idle_limit_in_recovery setting
>>
>>
>> Could one of these settings be the problem ?
>>
>> num_init_children = 70
>> max_pool = 1
>> child_life_time = 300
>> connection_life_time = 300
>> child_max_connections = 0
>> client_idle_limit = 0
>> authentication_timeout = 60
>> replication_mode = true
>> load_balance_mode = true
>> replication_stop_on_mismatch = false
>> connection_cache = true
>> health_check_timeout = 20
>> health_check_period = 15
>> health_check_user = 'postgres'
>> recovery_user = 'postgres'
>> recovery_password = ''
>> recovery_1st_stage_command = 'copy-base-backup'
>> recovery_2nd_stage_command = 'pgpool_recovery_pitr'
>> recovery_timeout = 90
>> client_idle_limit_in_recovery = 10
>>
>>
>>
>>
>> LOG:
>> -----
>> Nov 10 12:29:59 debian-db1 pgpool: 2008-11-10 12:29:59 LOG: pid
>> 30668: pgpool successfully started
>> Nov 10 12:29:59 debian-db1 pgpool: 2008-11-10 12:29:59 LOG: pid
>> 30668: set 2 th backend down status
>> Nov 10 12:29:59 debian-db1 pgpool: 2008-11-10 12:29:59 LOG: pid
>> 30668: starting degeneration. shutdown host debian-db4(5432)
>> Nov 10 12:30:00 debian-db1 pgpool: 2008-11-10 12:30:00 LOG: pid
>> 30668: failover_handler: set new master node: 0
>> Nov 10 12:30:01 debian-db1 pgpool: 2008-11-10 12:30:01 LOG: pid
>> 30668: failover done. shutdown host debian-db4(5432)
>> Nov 10 12:30:02 debian-db1 pgpool: 2008-11-10 12:30:02 LOG: pid
>> 30821: ProcessFrontendResponse: failed to read kind from frontend.
>> frontend abnormally exited
>> Nov 10 12:34:07 debian-db1 pgpool: 2008-11-10 12:34:07 LOG: pid
>> 30741: starting recovering node 2
>> Nov 10 12:34:08 debian-db1 pgpool: 2008-11-10 12:34:08 LOG: pid
>> 30741: CHECKPOINT in the 1st stage done
>> Nov 10 12:34:08 debian-db1 pgpool: 2008-11-10 12:34:08 LOG: pid
>> 30741: starting recovery command: "SELECT pgpool_recovery('copy-base-
>> backup', 'debian-db4', '/var/lib/postgresql/8.3/main')"
>> Nov 10 12:34:09 debian-db1 pgpool: 2008-11-10 12:34:09 LOG: pid
>> 30741: 1st stage is done
>> Nov 10 12:34:09 debian-db1 pgpool: 2008-11-10 12:34:09 LOG: pid
>> 30741: starting 2nd stage
>> Nov 10 12:34:17 debian-db1 pgpool: 2008-11-10 12:34:17 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:34:32 debian-db1 pgpool: 2008-11-10 12:34:32 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:34:47 debian-db1 pgpool: 2008-11-10 12:34:47 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:35:02 debian-db1 pgpool: 2008-11-10 12:35:02 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:35:17 debian-db1 pgpool: 2008-11-10 12:35:17 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:35:32 debian-db1 pgpool: 2008-11-10 12:35:32 DEBUG: pid
>> 30668: starting health checking
>> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 ERROR: pid
>> 30741: wait_connection_closed: existing connections did not close in
>> 90 sec.
>> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 ERROR: pid
>> 30741: start_recovery: timeover for waiting connection closed
>> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 DEBUG: pid
>> 30741: pcp_child: received PCP packet type of service 'X'
>> Nov 10 12:35:42 debian-db1 pgpool: 2008-11-10 12:35:42 DEBUG: pid
>> 30741: pcp_child: client disconnecting. close connection
>>
>>
>>
>>
>> NETSTAT
>> -------------
>> tcp 0 0 0.0.0.0:5432 0.0.0.0:*
>> LISTEN 30668/pgpool
>> tcp 0 0 10.1.100.213:5432 10.1.101.152:2936
>> ESTABLISHED30821/pgpool:
>> tcp 0 0 10.1.100.213:1947 10.1.100.217:5432
>> ESTABLISHED30821/pgpool:
>> tcp 1 0 127.0.0.1:4652 127.0.0.1:5432
>> CLOSE_WAIT 1973/python
>> tcp 0 0 10.1.100.213:4814 10.1.100.116:5432
>> ESTABLISHED30821/pgpool:
>>
>>
>>
>>
>> Thank you,
>> Marcelo
>>
>>
>> On Nov 8, 2008, at 9:05 AM, Tiago Macedo wrote:
>>
>>> Hi,
>>>
>>> I've tested this and it works perfectly. Exactly what I needed.
>>>
>>> Thank you so much for your hard work,
>>> Tiago Macedo
>>>
>>> On Fri, Nov 7, 2008 at 9:28 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
>>> wrote:
>>> Hi,
>>>
>>> I have implemented new directive "client_ilde_limit_in_recovery" per
>>> discussion. This is usefull for on line recovery. From the doc:
>>>
>>> client_ilde_limit_in_recovery
>>>
>>> Similar to client_idle_limit but only takes effect in
>>> recovery 2nd
>>> stage. Disconnect the connection to a client being idle for
>>> client_idle_limit_in_recovery seconds since the last query
>>> has
>>> been sent. This is usefull for preventing for pgpool
>>> recovery
>>> disturbed by a lazy client or TCP/IP connection between
>>> client and
>>> pgpool is accidentally down. The default value for
>>> client_idle_limit_in_recovery is 0, which means the
>>> functionality is turned
>>> off. You need to reload pgpool.conf if you change
>>> client_idle_limit_in_recovery.
>>>
>>> Note that now client_idle_limit only takes effect *other than* in
>>> the
>>> second stage of recovery. I think this makes more sense. Please
>>> let me
>>> know if you have any questions/comments.
>>>
>>> Thanks,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> _______________________________________________
>>> Pgpool-general mailing list
>>> Pgpool-general at pgfoundry.org
>>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>>>
>>> _______________________________________________
>>> Pgpool-general mailing list
>>> Pgpool-general at pgfoundry.org
>>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>>
> Index: pool_process_query.c
> ===================================================================
> RCS file: /cvsroot/pgpool/pgpool-II/pool_process_query.c,v
> retrieving revision 1.114
> diff -c -r1.114 pool_process_query.c
> *** pool_process_query.c 10 Nov 2008 00:58:24 -0000 1.114
> --- pool_process_query.c 11 Nov 2008 00:12:18 -0000
> ***************
> *** 376,381 ****
> --- 376,386 ----
> (*InRecovery && pool_config->client_idle_limit_in_recovery >
> 0)) && fds == 0)
> {
> frontend_idle_count++;
> +
> + pool_debug("idle count:%d InRecovery:%d client_idle_limit:%d
> client_idle_limit_in_recovery:%d",
> + frontend_idle_count, pool_config->client_idle_limit,
> + pool_config->client_idle_limit_in_recovery);
> +
> if (*InRecovery == 0 && (frontend_idle_count > pool_config-
> >client_idle_limit))
> {
> pool_log("pool_process_query: child connection forced to
> terminate due to client_idle_limit(%d) reached", pool_config-
> >client_idle_limit);
More information about the Pgpool-general
mailing list