View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000261 | Pgpool-II | Bug | public | 2016-11-11 17:54 | 2017-02-23 17:06 |
| Reporter | serk | Assigned To | t-ishii | ||
| Priority | high | Severity | block | Reproducibility | random |
| Status | closed | Resolution | open | ||
| Platform | x86 64 | OS | Centos | OS Version | 6.8 |
| Product Version | 3.5.4 | ||||
| Summary | 0000261: Random break of connections with error "kind mismatch among backends" | ||||
| Description | In production we randomly facing troubles with pgpool. The DB connection is broken and we receive error, thus it influences the stability! It doesnt depend on the loading and appears absolutely randomly. FATAL: failed to read kind from backend Detail: kind mismatch among backends. Possible last query was: "BEGIN" kind details are: 0[3] 1[2] Hint: check data consistency among db nodes How should we treat the error? | ||||
| Steps To Reproduce | We have 3 pgpool servers and 2 postgres backends being synced synchronously. | ||||
| Tags | No tags attached. | ||||
|
|
Also what does the message "check data consistency among db nodes" mean? The replication of requests is not configured. Please see the configuration file: listen_addresses = '*' port = 5432 socket_dir = '/var/run/pgpool' listen_backlog_multiplier = 2 serialize_accept = on pcp_listen_addresses = '*' pcp_port = 9898 pcp_socket_dir = '/var/run/pgpool' backend_hostname0 = 'onedb.node' backend_port0 = 15432 backend_weight0 = 1 backend_data_directory0 = '/var/lib/pgsql/9.5/data' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = 'twodb.node' backend_port1 = 15432 backend_weight1 = 1 backend_data_directory1 = '/var/lib/pgsql/9.5/data' backend_flag1 = 'ALLOW_TO_FAILOVER' enable_pool_hba = on pool_passwd = 'pool_passwd' authentication_timeout = 60 ssl = off num_init_children = 240 max_pool = 1 child_life_time = 0 child_max_connections = 0 connection_life_time = 0 client_idle_limit = 100 log_destination = 'syslog' log_line_prefix = '' log_connections = off log_hostname = off log_statement = off log_per_node_statement = off log_standby_delay = 'none' syslog_facility = 'LOCAL0' syslog_ident = 'pgpool' debug_level = 5 log_error_verbosity = default client_min_messages = debug2 log_min_messages = debug5 pid_file_name = '/var/run/pgpool/pgpool.pid' logdir = '/var/log/pgpool' connection_cache = on reset_query_list = 'ABORT; DISCARD ALL' replication_mode = off replicate_select = off insert_lock = off lobj_lock_table = '' replication_stop_in_mismatch = off failover_if_affected_tuples_mismatch = off load_balance_mode = on ignore_leading_white_space = on white_function_list = '' black_function_list = 'nextval,setval' database_redirect_preference_list = '' app_name_redirect_preference_list = '' allow_sql_comments = on master_slave_mode = on master_slave_sub_mode = 'stream' sr_check_period = 0 sr_check_user = 'postgres' sr_check_password = 'postgres' sr_check_database = 'postgres' delay_threshold = 0 follow_master_command = '' health_check_period = 10 health_check_timeout = 5 health_check_user = 'postgres' health_check_password = 'postgres' health_check_database = 'postgres' health_check_max_retries = 0 health_check_retry_delay = 1 connect_timeout = 10000 failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R' failback_command = '' fail_over_on_backend_error = on search_primary_node_timeout = 10 recovery_user = 'postgres' recovery_password = 'postgres' recovery_1st_stage_command = '' recovery_2nd_stage_command = '' recovery_timeout = 90 client_idle_limit_in_recovery = 0 use_watchdog = on trusted_servers = '' ping_path = '/bin' wd_hostname = 'server.two' wd_port = 9000 wd_priority = 10 wd_authkey = '' wd_ipc_socket_dir = '/var/run/pgpool' delegate_IP = '192.168.10.10' if_cmd_path = '/sbin' if_up_cmd = 'ip addr add $_IP_$/16 dev eth0' if_down_cmd = 'ip addr del $_IP_$/16 dev eth0' arping_path = '/usr/sbin' arping_cmd = 'arping -U $_IP_$ -w 1 -I eth0' clear_memqcache_on_escalation = on wd_escalation_command = '' wd_de_escalation_command = '' wd_monitoring_interfaces_list = '' wd_lifecheck_method = 'heartbeat' wd_interval = 2 wd_heartbeat_port = 9694 wd_heartbeat_keepalive = 2 wd_heartbeat_deadtime = 30 heartbeat_destination0 = '192.168.10.1' heartbeat_destination_port0 = 9694 heartbeat_device0 = 'eth0' other_pgpool_hostname0 = '192.168.10.1' other_pgpool_port0 = 5432 other_wd_port0 = 9000 heartbeat_destination1 = '192.168.10.3' heartbeat_destination_port1 = 9694 heartbeat_device1 = 'eth0' other_pgpool_hostname1 = '192.168.10.3' other_pgpool_port1 = 5432 other_wd_port1 = 9000 wd_life_point = 3 wd_lifecheck_query = 'SELECT 1' wd_lifecheck_dbname = 'template1' wd_lifecheck_user = 'nobody' wd_lifecheck_password = '' relcache_expire = 0 relcache_size = 256 check_temp_table = off check_unlogged_table = off memory_cache_enabled = off memqcache_method = 'shmem' memqcache_memcached_host = 'localhost' memqcache_memcached_port = 11211 memqcache_total_size = 5000000 memqcache_max_num_cache = 1000000 memqcache_expire = 1 memqcache_auto_cache_invalidation = on memqcache_maxcache = 409600 memqcache_cache_block_size = 1048576 memqcache_oiddir = '/var/log/pgpool/oiddir' white_memqcache_table_list = '' lack_memqcache_table_list = '' |
|
|
JBoss application server states the error: [org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory] (ConnectionValidator) IJ030027: Destroying connection that is not valid, due to the following exception: org.postgresql.jdbc4.Jdbc4Connection@288eb6c: org.postgresql.util.PSQLException: This connection has been closed. at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:822) at org.postgresql.jdbc3.AbstractJdbc3Connection.createStatement(AbstractJdbc3Connection.java:229) at org.postgresql.jdbc2.AbstractJdbc2Connection.createStatement(AbstractJdbc2Connection.java:294) at org.jboss.jca.adapters.jdbc.CheckValidConnectionSQL.isValidConnection(CheckValidConnectionSQL.java:71) at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnectionFactory.isValidConnection(BaseWrapperManagedConnectionFactory.java:1272) at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnectionFactory.getInvalidConnections(BaseWrapperManagedConnectionFactory.java:1085) at org.jboss.jca.core.connectionmanager.pool.mcp.SemaphoreConcurrentLinkedDequeManagedConnectionPool.validateConnections(SemaphoreConcurrentLinkedDequeManagedConnectionPool.java:1463) [ironjacamar-core-impl-1.3.4.Final.jar:1.3.4.Final] at org.jboss.jca.core.connectionmanager.pool.validator.ConnectionValidator$ConnectionValidatorRunner.run(ConnectionValidator.java:277) [ironjacamar-core-impl-1.3.4.Final.jar:1.3.4.Final] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_25] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_25] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_25] |
|
|
Hi Dear Community, is it possible to provide answer for the question. We are facing the trouble in out production environment. It seems to very very critical to us. |
|
|
Dear Pgpool Developers - Mr T-ishii, If you cannot detect the error in your environment than let me propose you to use Our QA environment!! We can setup screen sharing and/or grant you remote control, even setup build environment. So you will be able to trace the requests. We reproduce it locally with very high probability. I think it will useful and productive, much more efficient than send messages and wait for one which reproduces the issue. If you are interested in solving of the problem please provive me contacts to "resolvepool@gmail.com" and I will contact you using my corporate email. Here is one more error message which observe: Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20218]: [47-3] LOCATION: child.c:2123 Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20219]: [45-1] LOG: new connection received Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20219]: [45-2] DETAIL: connecting host=10.28.64.197 port=44444 Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20219]: [45-3] LOCATION: child.c:2123 Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20221]: [46-1] LOG: new connection received Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20221]: [46-2] DETAIL: connecting host=10.28.64.197 port=44445 Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20221]: [46-3] LOCATION: child.c:2123 Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [14-1] WARNING: packet kind of backend 1 ['3'] does not match with master/majority nodes packet kind ['2'] Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [14-2] LOCATION: pool_process_query.c:3571 Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [15-1] FATAL: failed to read kind from backend Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [15-2] DETAIL: kind mismatch among backends. Possible last query was: "COMMIT" kind details are: 0[2] 1[3] Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [15-3] HINT: check data consistency among db nodes Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [15-4] LOCATION: pool_process_query.c:3676 Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20070]: [114-1] LOG: child process with pid: 20748 exits with status 512 Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20070]: [114-2] LOCATION: pgpool_main.c:2243 Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20070]: [115-1] LOG: fork a new child process with pid: 21068 Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20070]: [115-2] LOCATION: pgpool_main.c:2329 Nov 21 20:59:15 srv-60b2f6e99b9b pgpool[20222]: [46-1] LOG: new connection received |
|
|
Hi, is it possible that postgres doesnt close connections correctly? Or probably it makes sense to wait and try read additional data from both backends? Please have a look at the following log information: Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50039-1] LOG: DB node id: 0 backend pid: 17294 statement: Parse: update on_screen_notificat ions set account_id=$1, context=$2, created=$3, details=$4, full_details=$5, initiator_name=$6, initiator_uuid=$7, ip_address=$8, link=$9, link_more=$10, link_more_text=$11, message=$12 , modified=$13, news_message_id=$14, package_id=$15, parent_uuid=$16, path=$17, service=$18, status=$19, type=$20, updated=$21, user_id=$22 where uuid=$23 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50039-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50052-1] LOG: DB node id: 0 backend pid: 17294 statement: B message Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50052-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50065-1] LOG: DB node id: 0 backend pid: 17294 statement: D message Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50065-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50075-1] LOG: DB node id: 0 backend pid: 17294 statement: Execute: update on_screen_notific ations set account_id=$1, context=$2, created=$3, details=$4, full_details=$5, initiator_name=$6, initiator_uuid=$7, ip_address=$8, link=$9, link_more=$10, link_more_text=$11, message=$ 12, modified=$13, news_message_id=$14, package_id=$15, parent_uuid=$16, path=$17, service=$18, status=$19, type=$20, updated=$21, user_id=$22 where uuid=$23 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50075-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50146-1] LOG: DB node id: 0 backend pid: 17294 statement: Parse: select distinct onscreenno 0_.uuid as uuid1_125_0_, newsmessag2_.msg_id as msg_id1_118_1_, onscreenno0_.account_id as account_2_125_0_, onscreenno0_.context as context3_125_0_, onscreenno0_.created as created4_12 5_0_, onscreenno0_.details as details5_125_0_, onscreenno0_.full_details as full_det6_125_0_, onscreenno0_.initiator_name as initiato7_125_0_, onscreenno0_.initiator_uuid as initiato8_1 25_0_, onscreenno0_.ip_address as ip_addre9_125_0_, onscreenno0_.link as link10_125_0_, onscreenno0_.link_more as link_mo11_125_0_, onscreenno0_.link_more_text as link_mo12_125_0_, onsc reenno0_.message as message13_125_0_, onscreenno0_.modified as modifie14_125_0_, onscreenno0_.news_message_id as news_me22_125_0_, onscreenno0_.package_id as package15_125_0_, onscreenn o0_.parent_uuid as parent_23_125_0_, onscreenno0_.path as path16_125_0_, Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50146-2] onscreenno0_.service as service17_125_0_, onscreenno0_.status as status18_125_0_, onscreenno0_.type as type19_125_0_, onscreenno0_.updated as updated20_125_0_, onscreenno0_.user Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50146-3] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50159-1] LOG: DB node id: 0 backend pid: 17294 statement: B message Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50159-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50172-1] LOG: DB node id: 0 backend pid: 17294 statement: D message Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50172-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50182-1] LOG: DB node id: 0 backend pid: 17294 statement: Execute: select distinct onscreen no0_.uuid as uuid1_125_0_, newsmessag2_.msg_id as msg_id1_118_1_, onscreenno0_.account_id as account_2_125_0_, onscreenno0_.context as context3_125_0_, onscreenno0_.created as created4_ 125_0_, onscreenno0_.details as details5_125_0_, onscreenno0_.full_details as full_det6_125_0_, onscreenno0_.initiator_name as initiato7_125_0_, onscreenno0_.initiator_uuid as initiato8 _125_0_, onscreenno0_.ip_address as ip_addre9_125_0_, onscreenno0_.link as link10_125_0_, onscreenno0_.link_more as link_mo11_125_0_, onscreenno0_.link_more_text as link_mo12_125_0_, on screenno0_.message as message13_125_0_, onscreenno0_.modified as modifie14_125_0_, onscreenno0_.news_message_id as news_me22_125_0_, onscreenno0_.package_id as package15_125_0_, onscree nno0_.parent_uuid as parent_23_125_0_, onscreenno0_.path as path16_125_0_, Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50182-2] onscreenno0_.service as service17_125_0_, onscreenno0_.status as status18_125_0_, onscreenno0_.type as type19_125_0_, onscreenno0_.updated as updated20_125_0_, onscreenno0_.us Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50182-3] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50258-1] LOG: DB node id: 0 backend pid: 17294 statement: B message Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50258-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50261-1] LOG: DB node id: 1 backend pid: 26559 statement: B message Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50261-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50275-1] LOG: DB node id: 0 backend pid: 17294 statement: Execute: COMMIT Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50275-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50279-1] LOG: DB node id: 1 backend pid: 26559 statement: Execute: COMMIT Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50279-2] LOCATION: pool_proto_modules.c:3084 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50292-1] WARNING: packet kind of backend 1 ['3'] does not match with master/majority nodes packet kind ['2'] Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50292-2] LOCATION: pool_process_query.c:3603 Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50293-1] FATAL: failed to read kind from backend Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50293-2] DETAIL: kind mismatch among backends. Possible last query was: "COMMIT" kind details are: 0[2] 1[3] Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50293-3] HINT: check data consistency among db nodes |
|
|
Seems similar problem as 0000231. The same fix might for you. Please try attached patch. |
|
|
|
|
|
No feedback from the original reporter over 1 month. Issue closed. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2016-11-11 17:54 | serk | New Issue | |
| 2016-11-11 17:58 | serk | Note Added: 0001168 | |
| 2016-11-11 18:03 | serk | Note Added: 0001169 | |
| 2016-11-20 04:48 | serk | Note Added: 0001177 | |
| 2016-11-22 03:10 | supp_k | Note Added: 0001184 | |
| 2016-11-24 00:33 | supp_k | Note Added: 0001188 | |
| 2016-12-06 07:53 | t-ishii | Assigned To | => t-ishii |
| 2016-12-06 07:53 | t-ishii | Status | new => assigned |
| 2016-12-06 07:54 | t-ishii | Note Added: 0001200 | |
| 2016-12-06 07:54 | t-ishii | File Added: bug231.diff | |
| 2016-12-06 07:54 | t-ishii | Status | assigned => feedback |
| 2017-02-23 17:06 | t-ishii | Note Added: 0001360 | |
| 2017-02-23 17:06 | t-ishii | Status | feedback => closed |