[pgpool-general: 6957] Fwd: pcp_attach_node downtime

Tue Apr 7 23:17:38 JST 2020

Hello,

let's try this again:
I'm running a pgpool docker container that attaches to two nodes:

```

docker exec -it pgpool /bin/bash -c 'pgpool version'
pgpool-II version 4.1.1 (karasukiboshi),
  A generic connection pool/replication/load balance server for PostgreSQLUsage:

```

With the following configuration:

```

listen_addresses = '127.0.0.1'
port = 5431
socket_dir = '/var/run/postgresql'
pcp_port = 9898
pcp_socket_dir = '/var/run/postgresql'
      backend_data_directory0 = '/var/lib/postgresql/9.6/main'
      backend_flag0 = 'DISALLOW_TO_FAILOVER'
      backend_hostname0 = 'some.db.com'
      backend_port0 = 5432
      backend_weight0 = 0
  backend_data_directory0 = '/var/lib/postgresql/9.2/main'
      backend_data_directory1 = '/var/lib/postgresql/9.6/main'
      backend_flag1 = 'ALLOW_TO_FAILOVER'
      backend_hostname1 = '10.32.1.38'
      backend_port1 = 5432
      backend_weight1 = 1
  backend_data_directory1 = '/var/lib/postgresql/9.2/main'
enable_pool_hba = on
authentication_timeout = 60
ssl = off
num_init_children = 329
max_pool = 1
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 150
log_destination = 'stderr'
print_timestamp = on
log_connections = off
log_hostname = off
log_statement = off
log_per_node_statement = off
log_standby_delay = 'always'
syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'
debug_level = 0
pid_file_name = '/var/run/postgresql/pgpool.pid'
logdir = '/var/log/postgresql'
connection_cache = off
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off
insert_lock = on
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 5
sr_check_user = 'srcheck'
follow_master_command = ''
parallel_mode = off
pgpool2_hostname = ''
system_db_hostname  = 'localhost'
system_db_port = 5432
system_db_dbname = 'pgpool'
system_db_schema = 'pgpool_catalog'
system_db_user = 'pgpool'
health_check_period = 5
health_check_timeout = 10
health_check_user = 'healthcheck'
health_check_max_retries = 0
health_check_retry_delay = 1
failover_command = ''
failback_command = ''
failover_on_backend_error = off
search_primary_node_timeout = 10
recovery_user = 'nobody'
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
use_watchdog = off
trusted_servers = ''
ping_path = '/bin'
wd_hostname = ''
wd_port = 9000
wd_authkey = ''
delegate_IP = ''
ifconfig_path = '/sbin'
if_up_cmd = 'ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.0'
if_down_cmd = 'ifconfig eth0:0 down'
arping_cmd = 'arping -U $_IP_$ -w 1'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'host0_ip1'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'postgres'
wd_lifecheck_user = 'postgres'
relcache_query_target = master
relcache_expire = 0
relcache_size = 256
check_temp_table = on
memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 67108864
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 409600
memqcache_cache_block_size = 1048576
memqcache_oiddir = '/var/log/pgpool/oiddir'
white_memqcache_table_list = ''
black_memqcache_table_list = ''

```

So we end up with the following `pool_nodes` result:

```

 node_id |                          hostname
| port | status | lb_weight |  role   | select_cnt | load_balance_node
| replication_delay | replication_state | replication_sync_state |
last_status_change
---------+------------------------------------------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | some.db.com | 5432 | up     | 0.000000  | primary | 0
   | false             | 0                 |                   |
                 | 2020-03-31 22:04:22
 1       | 10.32.1.38
| 5432 | up     | 1.000000  | standby | 0          | true
| 3283947928401     |                   |                        |
2020-03-31 22:04:22
(2 rows)

```

At times, the "standby" node might go down for whatever reason. When I
issue pcp_attach_node command to start sending queries to it - the
whole pgpool interface seems to be disrupted:

- clients see disruption in queries

- no new queries are going through

- new connections to pgpool fail

Normal functionality is restored after about 2-5 seconds.

Is this an expected behavior or is there a more graceful way to
reattach a node in pgpool?

Thank you,

Artem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20200407/cf58c051/attachment.html>