View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000447 | Pgpool-II | Bug | public | 2018-11-22 10:29 | 2018-12-04 09:26 |
| Reporter | Parkbyunggyu | Assigned To | t-ishii | ||
| Priority | high | Severity | major | Reproducibility | always |
| Status | assigned | Resolution | open | ||
| Platform | Linux X86_64 | OS | CentOS | OS Version | 7.3 |
| Product Version | 3.6.12 | ||||
| Summary | 0000447: connection expired problem | ||||
| Description | I set the connection setting for pg-pool to 2000 users. 760 users are connected, How ever More than 760 users can not connect to the pg-poolII because pg-pool said the connection time has expired. 1. I want to know why this is happening. Do you know anyone? 2. Do you know the various reasons why connection expiration time occurs in pg-pool? I would like to know if there are many examples of why a connection expires. 3. here is my pg-pool & DB configuration and if you need another information please reply PG-pool config listen_addresses = '*' port = XXXX socket_dir = '/tmp' pcp_listen_addresses = '*' pcp_port = XXXX pcp_socket_dir = '/tmp' listen_backlog_multiplier = 2 serialize_accept = off backend_hostname0 = 'xx.xx.xxx.xxx' backend_port0 = XXXX backend_weight0 = 0.5 backend_data_directory0 = '/data/data' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = 'xx.xx.xxx.xxx' backend_port1 = XXXX backend_weight1 = 0.5 backend_data_directory1 = '/data/data' backend_flag1 = 'ALLOW_TO_FAILOVER' enable_pool_hba = off pool_passwd = 'pool_passwd' authentication_timeout = 60 ssl = off num_init_children = 2050 max_pool = 1 child_life_time = 300 child_max_connections = 0 connection_life_time = 0 client_idle_limit = 0 log_destination = 'stderr' log_connections = on log_hostname = off log_statement = off log_per_node_statement = off log_standby_delay = 'if_over_threshold' syslog_facility = 'LOCAL0' logdir = '/tmp' syslog_ident = 'pgpool' pid_file_name = '/var/run/pgpool/pgpool.pid' connection_cache = on reset_query_list = 'ABORT; DISCARD ALL' replication_mode = off replicate_select = off insert_lock = off lobj_lock_table = '' replication_stop_on_mismatch = off failover_if_affected_tuples_mismatch = off load_balance_mode = on ignore_leading_white_space = on white_function_list = '' black_function_list = 'currval,lastval,nextval,setval' database_redirect_preference_list = '' app_name_redirect_preference_list = '' allow_sql_comments = off master_slave_mode = on master_slave_sub_mode = 'stream' sr_check_period = 10 sr_check_user = 'XXXXXX' sr_check_password = 'XXXXXX' sr_check_database = 'postgres' delay_threshold = 10000000 follow_master_command = '' health_check_period = 0 health_check_timeout = 20 health_check_user = 'nobody' health_check_password = '' health_check_database = '' health_check_max_retries = 0 health_check_retry_delay = 1 connect_timeout = 10000 failover_command = '/etc/pgpool-II/failover.sh %d %P %H %R' failback_command = '' fail_over_on_backend_error = on search_primary_node_timeout = 300 recovery_user = 'nobody' recovery_password = '' recovery_1st_stage_command = '' recovery_2nd_stage_command = '' recovery_timeout = 90 client_idle_limit_in_recovery = 0 use_watchdog = off trusted_servers = '' ping_path = '/bin' wd_hostname = '' wd_port = 9000 wd_priority = 1 wd_authkey = '' wd_ipc_socket_dir = '/tmp' delegate_IP = '' if_cmd_path = '/sbin' if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0' if_down_cmd = 'ip addr del $_IP_$/24 dev eth0' arping_path = '/usr/sbin' arping_cmd = 'arping -U $_IP_$ -w 1' clear_memqcache_on_escalation = on wd_escalation_command = '' wd_de_escalation_command = '' wd_lifecheck_method = 'heartbeat' wd_interval = 2 wd_heartbeat_port = XXXX wd_heartbeat_keepalive = 2 wd_heartbeat_deadtime = 30 heartbeat_destination0 = 'host0_ip1' heartbeat_destination_port0 = XXXX heartbeat_device0 = '' wd_life_point = 3 wd_lifecheck_query = 'SELECT 1' wd_lifecheck_dbname = 'template1' wd_lifecheck_user = 'nobody' wd_lifecheck_password = '' relcache_expire = 0 relcache_size = 256 check_temp_table = on check_unlogged_table = on memory_cache_enabled = off memqcache_method = 'shmem' memqcache_memcached_host = 'localhost' memqcache_memcached_port = 11211 memqcache_total_size = 67108864 memqcache_max_num_cache = 1000000 memqcache_expire = 0 memqcache_auto_cache_invalidation = on memqcache_maxcache = 409600 memqcache_cache_block_size = 1048576 memqcache_oiddir = '/var/log/pgpool/oiddir' white_memqcache_table_list = '' black_memqcache_table_list = '' DB config listen_addresses ='*' port = xxxx max_connections = 2100 shared_buffers = 32GB work_mem = 16MB maintenance_work_mem = 1GB wal_level = replica checkpoint_timeout = 10min max_wal_size = 4GB min_wal_size = 2GB checkpoint_completion_target = 0.9 archive_mode = on archive_command = 'cp %p /data/archive/%f' max_wal_senders = 5 random_page_cost = 2.0 effective_cache_size = 96GB log_destination = 'stderr' logging_collector = on log_min_duration_statement = 3s track_activity_query_size = 65536 pg_stat_statements.track = all pg_stat_statements.max = 10000 shared_preload_libraries = '$libdir/pg_stat_statements' | ||||
| Tags | No tags attached. | ||||
|
|
"pg-pool said the connection time has expired." is vague because there are several places where the word "expire" is used in the source code. Can you share the pgpool log? |
|
|
Hi. t-ishii here is log And i have a one more question To connect 1000 users to the following database at the same time, database cluster : Master - Slave database name : postgres ( only one database ) database user name : sp_user, dml_user, slct_user Can I set parameter values as shown below? database parameter: max_connection = 1000 (master, slave) pg-pool parameter: num_init_children = 1000, max_pool = 1 |
|
|
I don't see "expire" in any of your log files. What was the exact message when you said "because pg-pool said the connection time has expired."? |
|
|
I am so sorry. I had misunderstood and questioned you about my previous question. My verbal mistake was. What I was wondering is again: The number of users in the database is not increasing as much as I intend. So I am worried that the parameter is a problem. And the following error is frequently encountered in the log I attached 2018-11-21 15:33:18: pid 26517: FATAL: unable to read data from DB node 0 2018-11-21 15:33:18: pid 26517: DETAIL: EOF encountered with backend 2018-11-21 15:33:18: pid 26517: WARNING: write on backend 0 failed with error: "Success" 2018-11-21 15:33:18: pid 26517: DETAIL: while trying to write data from offset: 0 wlen: 5 I wonder if there is a possibility that the pgpool will not increase the number of users because of this error. I ask you again clearly. (sorry..) 1. To connect 1000 users to the same database at the same time, database cluster: Master - Slave database name: postgres (only one database) database user name: sp_user, dml_user, slct_user Can I set parameter values as shown below? database parameter: max_connection = 1000 (master, slave) pg-pool parameter: num_init_children = 1000, max_pool = 1 2. I am getting the error mentioned above, is there any solution? |
|
|
The setting seems to be sane. Can you show me the result of: "show pool_nodes"? Also I need PostgreSQL log at the same time when followings happend/ 2018-11-21 15:33:18: pid 26517: FATAL: unable to read data from DB node 0 2018-11-21 15:33:18: pid 26517: DETAIL: EOF encountered with backend 2018-11-21 15:33:18: pid 26517: WARNING: write on backend 0 failed with error: "Success" 2018-11-21 15:33:18: pid 26517: DETAIL: while trying to write data from offset: 0 wlen: 5 |
|
|
hi t-ishii here is what you want node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+---------------+------+--------+-----------+---------+------------+-------------------+------------------- 0 | 12.12.300.141 | 5932 | up | 0.400000 | primary | 10309485 | false | 0 1 | 12.12.300.142 | 5932 | up | 0.600000 | standby | 16443283 | true | 0 |
|
|
t-ishii... Are you busy a lot? I'll keep you waiting.... |
|
|
Unfortunately your log did not give a hint to track the problem. So I have tried to reproduce your problem by using num_init_children = 1980 and max_connections = 2000 but I got OOM killer. So I need to look for more powerful machine. |
|
|
Ok i understand. I understand that you are configuring the TEST environment to reproduce my situation. I respect you for trying to reproduce it like this. I'll keep you waiting. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2018-11-22 10:29 | Parkbyunggyu | New Issue | |
| 2018-11-23 07:26 | t-ishii | Note Added: 0002278 | |
| 2018-11-27 16:30 | Parkbyunggyu | File Added: pgpool-log.zip | |
| 2018-11-27 16:30 | Parkbyunggyu | Note Added: 0002287 | |
| 2018-11-27 16:41 | t-ishii | Note Added: 0002289 | |
| 2018-11-27 16:54 | Parkbyunggyu | Note Added: 0002291 | |
| 2018-11-27 17:17 | t-ishii | Note Added: 0002292 | |
| 2018-11-28 11:32 | Parkbyunggyu | File Added: DB-log.zip | |
| 2018-11-28 11:32 | Parkbyunggyu | Note Added: 0002293 | |
| 2018-12-03 14:16 | Parkbyunggyu | Note Added: 0002298 | |
| 2018-12-03 15:07 | t-ishii | Note Added: 0002299 | |
| 2018-12-03 15:16 | Parkbyunggyu | Note Added: 0002300 | |
| 2018-12-04 09:26 | t-ishii | Assigned To | => t-ishii |
| 2018-12-04 09:26 | t-ishii | Status | new => assigned |