View Issue Details

IDProjectCategoryView StatusLast Update
0000447Pgpool-IIBugpublic2018-12-04 09:26
ReporterParkbyunggyuAssigned Tot-ishii 
PriorityhighSeveritymajorReproducibilityalways
Status assignedResolutionopen 
PlatformLinux X86_64OSCentOSOS Version7.3
Product Version3.6.12 
Target VersionFixed in Version 
Summary0000447: connection expired problem
DescriptionI set the connection setting for pg-pool to 2000 users.
760 users are connected,

How ever More than 760 users can not connect to the
pg-poolII
because pg-pool said the connection time has expired.


1. I want to know why this is happening. Do you know anyone?

2. Do you know the various reasons why connection expiration time occurs in pg-pool?
I would like to know if there are many examples of why a connection expires.



3. here is my pg-pool & DB configuration and if you need another information please reply

PG-pool config
listen_addresses = '*'
port = XXXX
socket_dir = '/tmp'
pcp_listen_addresses = '*'
pcp_port = XXXX
pcp_socket_dir = '/tmp'
listen_backlog_multiplier = 2
serialize_accept = off
backend_hostname0 = 'xx.xx.xxx.xxx'
backend_port0 = XXXX
backend_weight0 = 0.5
backend_data_directory0 = '/data/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'xx.xx.xxx.xxx'
backend_port1 = XXXX
backend_weight1 = 0.5
backend_data_directory1 = '/data/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = off
pool_passwd = 'pool_passwd'
authentication_timeout = 60
ssl = off
num_init_children = 2050
max_pool = 1
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
log_destination = 'stderr'
log_connections = on
log_hostname = off
log_statement = off
log_per_node_statement = off
log_standby_delay = 'if_over_threshold'
syslog_facility = 'LOCAL0'
logdir = '/tmp'
syslog_ident = 'pgpool'
pid_file_name = '/var/run/pgpool/pgpool.pid'
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off
insert_lock = off
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'currval,lastval,nextval,setval'
database_redirect_preference_list = ''
app_name_redirect_preference_list = ''
allow_sql_comments = off
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 10
sr_check_user = 'XXXXXX'
sr_check_password = 'XXXXXX'
sr_check_database = 'postgres'
delay_threshold = 10000000
follow_master_command = ''
health_check_period = 0
health_check_timeout = 20
health_check_user = 'nobody'
health_check_password = ''
health_check_database = ''
health_check_max_retries = 0
health_check_retry_delay = 1
connect_timeout = 10000
failover_command = '/etc/pgpool-II/failover.sh %d %P %H %R'
failback_command = ''
fail_over_on_backend_error = on
search_primary_node_timeout = 300
recovery_user = 'nobody'
recovery_password = ''
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
use_watchdog = off
trusted_servers = ''
ping_path = '/bin'
wd_hostname = ''
wd_port = 9000
wd_priority = 1
wd_authkey = ''
wd_ipc_socket_dir = '/tmp'
delegate_IP = ''
if_cmd_path = '/sbin'
if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = 'ip addr del $_IP_$/24 dev eth0'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U $_IP_$ -w 1'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_de_escalation_command = ''
wd_lifecheck_method = 'heartbeat'
wd_interval = 2
wd_heartbeat_port = XXXX
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'host0_ip1'
heartbeat_destination_port0 = XXXX
heartbeat_device0 = ''
wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''
relcache_expire = 0
relcache_size = 256
check_temp_table = on
check_unlogged_table = on
memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 67108864
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 409600
memqcache_cache_block_size = 1048576
memqcache_oiddir = '/var/log/pgpool/oiddir'
white_memqcache_table_list = ''
black_memqcache_table_list = ''




DB config
listen_addresses ='*'
port = xxxx
max_connections = 2100
shared_buffers = 32GB
work_mem = 16MB
maintenance_work_mem = 1GB
wal_level = replica
checkpoint_timeout = 10min
max_wal_size = 4GB
min_wal_size = 2GB
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'cp %p /data/archive/%f'
max_wal_senders = 5
random_page_cost = 2.0
effective_cache_size = 96GB
log_destination = 'stderr'
logging_collector = on
log_min_duration_statement = 3s
track_activity_query_size = 65536
pg_stat_statements.track = all
pg_stat_statements.max = 10000
shared_preload_libraries = '$libdir/pg_stat_statements'
TagsNo tags attached.

Activities

t-ishii

2018-11-23 07:26

developer   ~0002278

"pg-pool said the connection time has expired." is vague because there are several places where the word "expire" is used in the source code. Can you share the pgpool log?

Parkbyunggyu

2018-11-27 16:30

reporter   ~0002287

Hi. t-ishii here is log

And i have a one more question
To connect 1000 users to the following database at the same time,

database cluster : Master - Slave
database name : postgres ( only one database )
database user name : sp_user, dml_user, slct_user

Can I set parameter values as shown below?
database parameter: max_connection = 1000 (master, slave)
pg-pool parameter: num_init_children = 1000, max_pool = 1

pgpool-log.zip (65,037 bytes)

t-ishii

2018-11-27 16:41

developer   ~0002289

I don't see "expire" in any of your log files. What was the exact message when you said "because pg-pool said the connection time has expired."?

Parkbyunggyu

2018-11-27 16:54

reporter   ~0002291

I am so sorry. I had misunderstood and questioned you about my previous question.
My verbal mistake was.

What I was wondering is again:
The number of users in the database is not increasing as much as I intend.
So I am worried that the parameter is a problem.

And the following error is frequently encountered in the log I attached
2018-11-21 15:33:18: pid 26517: FATAL: unable to read data from DB node 0
2018-11-21 15:33:18: pid 26517: DETAIL: EOF encountered with backend
2018-11-21 15:33:18: pid 26517: WARNING: write on backend 0 failed with error: "Success"
2018-11-21 15:33:18: pid 26517: DETAIL: while trying to write data from offset: 0 wlen: 5

I wonder if there is a possibility that the pgpool will not increase the number of users because of this error.


I ask you again clearly. (sorry..)
1. To connect 1000 users to the same database at the same time,

database cluster: Master - Slave
database name: postgres (only one database)
database user name: sp_user, dml_user, slct_user

Can I set parameter values ​​as shown below?
database parameter: max_connection = 1000 (master, slave)
pg-pool parameter: num_init_children = 1000, max_pool = 1



2. I am getting the error mentioned above, is there any solution?

t-ishii

2018-11-27 17:17

developer   ~0002292

The setting seems to be sane.
Can you show me the result of: "show pool_nodes"?
Also I need PostgreSQL log at the same time when followings happend/

2018-11-21 15:33:18: pid 26517: FATAL: unable to read data from DB node 0
2018-11-21 15:33:18: pid 26517: DETAIL: EOF encountered with backend
2018-11-21 15:33:18: pid 26517: WARNING: write on backend 0 failed with error: "Success"
2018-11-21 15:33:18: pid 26517: DETAIL: while trying to write data from offset: 0 wlen: 5

Parkbyunggyu

2018-11-28 11:32

reporter   ~0002293

hi t-ishii
here is what you want

node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay

---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------

0 | 12.12.300.141 | 5932 | up | 0.400000 | primary | 10309485 | false | 0

1 | 12.12.300.142 | 5932 | up | 0.600000 | standby | 16443283 | true | 0

DB-log.zip (99,721 bytes)

Parkbyunggyu

2018-12-03 14:16

reporter   ~0002298

t-ishii...
Are you busy a lot?
I'll keep you waiting....

t-ishii

2018-12-03 15:07

developer   ~0002299

Unfortunately your log did not give a hint to track the problem.
So I have tried to reproduce your problem by using num_init_children = 1980 and max_connections = 2000 but I got OOM killer. So I need to look for more powerful machine.

Parkbyunggyu

2018-12-03 15:16

reporter   ~0002300

Ok i understand.

I understand that you are configuring the TEST environment to reproduce my situation.

I respect you for trying to reproduce it like this.

I'll keep you waiting.

Issue History

Date Modified Username Field Change
2018-11-22 10:29 Parkbyunggyu New Issue
2018-11-23 07:26 t-ishii Note Added: 0002278
2018-11-27 16:30 Parkbyunggyu File Added: pgpool-log.zip
2018-11-27 16:30 Parkbyunggyu Note Added: 0002287
2018-11-27 16:41 t-ishii Note Added: 0002289
2018-11-27 16:54 Parkbyunggyu Note Added: 0002291
2018-11-27 17:17 t-ishii Note Added: 0002292
2018-11-28 11:32 Parkbyunggyu File Added: DB-log.zip
2018-11-28 11:32 Parkbyunggyu Note Added: 0002293
2018-12-03 14:16 Parkbyunggyu Note Added: 0002298
2018-12-03 15:07 t-ishii Note Added: 0002299
2018-12-03 15:16 Parkbyunggyu Note Added: 0002300
2018-12-04 09:26 t-ishii Assigned To => t-ishii
2018-12-04 09:26 t-ishii Status new => assigned