View Issue Details

IDProjectCategoryView StatusLast Update
0000261Pgpool-IIBugpublic2017-02-23 17:06
Reporterserk Assigned Tot-ishii  
PriorityhighSeverityblockReproducibilityrandom
Status closedResolutionopen 
Platformx86 64OSCentosOS Version6.8
Product Version3.5.4 
Summary0000261: Random break of connections with error "kind mismatch among backends"
DescriptionIn production we randomly facing troubles with pgpool. The DB connection is broken and we receive error, thus it influences the stability!

It doesnt depend on the loading and appears absolutely randomly.


FATAL: failed to read kind from backend
  Detail: kind mismatch among backends. Possible last query was: "BEGIN" kind details are: 0[3] 1[2]
  Hint: check data consistency among db nodes


How should we treat the error?
Steps To ReproduceWe have 3 pgpool servers and 2 postgres backends being synced synchronously.
TagsNo tags attached.

Activities

serk

2016-11-11 17:58

reporter   ~0001168

Also what does the message "check data consistency among db nodes" mean? The replication of requests is not configured. Please see the configuration file:


listen_addresses = '*'
port = 5432


socket_dir = '/var/run/pgpool'
listen_backlog_multiplier = 2
serialize_accept = on


pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/var/run/pgpool'


backend_hostname0 = 'onedb.node'
backend_port0 = 15432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/9.5/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'twodb.node'
backend_port1 = 15432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/9.5/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'


enable_pool_hba = on
pool_passwd = 'pool_passwd'
authentication_timeout = 60
ssl = off




num_init_children = 240
max_pool = 1
child_life_time = 0
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 100




log_destination = 'syslog'
log_line_prefix = ''
log_connections = off
log_hostname = off
log_statement = off
log_per_node_statement = off
log_standby_delay = 'none'
syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'

debug_level = 5
log_error_verbosity = default
client_min_messages = debug2
log_min_messages = debug5


pid_file_name = '/var/run/pgpool/pgpool.pid'
logdir = '/var/log/pgpool'


connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'


replication_mode = off
replicate_select = off
insert_lock = off
lobj_lock_table = ''
replication_stop_in_mismatch = off
failover_if_affected_tuples_mismatch = off


load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'
database_redirect_preference_list = ''
app_name_redirect_preference_list = ''
allow_sql_comments = on


master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 0
sr_check_user = 'postgres'
sr_check_password = 'postgres'
sr_check_database = 'postgres'
delay_threshold = 0
follow_master_command = ''


health_check_period = 10
health_check_timeout = 5
health_check_user = 'postgres'
health_check_password = 'postgres'
health_check_database = 'postgres'
health_check_max_retries = 0
health_check_retry_delay = 1
connect_timeout = 10000


failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R'
failback_command = ''
fail_over_on_backend_error = on
search_primary_node_timeout = 10


recovery_user = 'postgres'
recovery_password = 'postgres'
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0


use_watchdog = on
trusted_servers = ''
ping_path = '/bin'
wd_hostname = 'server.two'
wd_port = 9000
wd_priority = 10
wd_authkey = ''
wd_ipc_socket_dir = '/var/run/pgpool'

delegate_IP = '192.168.10.10'
if_cmd_path = '/sbin'
if_up_cmd = 'ip addr add $_IP_$/16 dev eth0'
if_down_cmd = 'ip addr del $_IP_$/16 dev eth0'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U $_IP_$ -w 1 -I eth0'

clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_de_escalation_command = ''

wd_monitoring_interfaces_list = ''
wd_lifecheck_method = 'heartbeat'
wd_interval = 2
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30

heartbeat_destination0 = '192.168.10.1'
heartbeat_destination_port0 = 9694
heartbeat_device0 = 'eth0'

other_pgpool_hostname0 = '192.168.10.1'
other_pgpool_port0 = 5432
other_wd_port0 = 9000

heartbeat_destination1 = '192.168.10.3'
heartbeat_destination_port1 = 9694
heartbeat_device1 = 'eth0'

other_pgpool_hostname1 = '192.168.10.3'
other_pgpool_port1 = 5432
other_wd_port1 = 9000


wd_life_point = 3

wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''



relcache_expire = 0
relcache_size = 256
check_temp_table = off
check_unlogged_table = off


memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 5000000
memqcache_max_num_cache = 1000000
memqcache_expire = 1
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 409600
memqcache_cache_block_size = 1048576
memqcache_oiddir = '/var/log/pgpool/oiddir'
white_memqcache_table_list = ''
lack_memqcache_table_list = ''

serk

2016-11-11 18:03

reporter   ~0001169

JBoss application server states the error:

[org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory] (ConnectionValidator) IJ030027: Destroying connection that is not valid, due to the following exception: org.postgresql.jdbc4.Jdbc4Connection@288eb6c: org.postgresql.util.PSQLException: This connection has been closed.
        at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:822)
        at org.postgresql.jdbc3.AbstractJdbc3Connection.createStatement(AbstractJdbc3Connection.java:229)
        at org.postgresql.jdbc2.AbstractJdbc2Connection.createStatement(AbstractJdbc2Connection.java:294)
        at org.jboss.jca.adapters.jdbc.CheckValidConnectionSQL.isValidConnection(CheckValidConnectionSQL.java:71)
        at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnectionFactory.isValidConnection(BaseWrapperManagedConnectionFactory.java:1272)
        at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnectionFactory.getInvalidConnections(BaseWrapperManagedConnectionFactory.java:1085)
        at org.jboss.jca.core.connectionmanager.pool.mcp.SemaphoreConcurrentLinkedDequeManagedConnectionPool.validateConnections(SemaphoreConcurrentLinkedDequeManagedConnectionPool.java:1463) [ironjacamar-core-impl-1.3.4.Final.jar:1.3.4.Final]
        at org.jboss.jca.core.connectionmanager.pool.validator.ConnectionValidator$ConnectionValidatorRunner.run(ConnectionValidator.java:277) [ironjacamar-core-impl-1.3.4.Final.jar:1.3.4.Final]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_25]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_25]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_25]

serk

2016-11-20 04:48

reporter   ~0001177

Hi Dear Community,

is it possible to provide answer for the question. We are facing the trouble in out production environment. It seems to very very critical to us.

supp_k

2016-11-22 03:10

reporter   ~0001184

Dear Pgpool Developers - Mr T-ishii,

If you cannot detect the error in your environment than let me propose you to use Our QA environment!!

We can setup screen sharing and/or grant you remote control, even setup build environment. So you will be able to trace the requests. We reproduce it locally with very high probability.

I think it will useful and productive, much more efficient than send messages and wait for one which reproduces the issue.


If you are interested in solving of the problem please provive me contacts to "resolvepool@gmail.com" and I will contact you using my corporate email.


Here is one more error message which observe:

Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20218]: [47-3] LOCATION: child.c:2123
Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20219]: [45-1] LOG: new connection received
Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20219]: [45-2] DETAIL: connecting host=10.28.64.197 port=44444
Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20219]: [45-3] LOCATION: child.c:2123
Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20221]: [46-1] LOG: new connection received
Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20221]: [46-2] DETAIL: connecting host=10.28.64.197 port=44445
Nov 21 20:58:14 srv-60b2f6e99b9b pgpool[20221]: [46-3] LOCATION: child.c:2123
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [14-1] WARNING: packet kind of backend 1 ['3'] does not match with master/majority nodes packet kind ['2']
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [14-2] LOCATION: pool_process_query.c:3571
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [15-1] FATAL: failed to read kind from backend
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [15-2] DETAIL: kind mismatch among backends. Possible last query was: "COMMIT" kind details are: 0[2] 1[3]
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [15-3] HINT: check data consistency among db nodes
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20748]: [15-4] LOCATION: pool_process_query.c:3676
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20070]: [114-1] LOG: child process with pid: 20748 exits with status 512
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20070]: [114-2] LOCATION: pgpool_main.c:2243
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20070]: [115-1] LOG: fork a new child process with pid: 21068
Nov 21 20:58:50 srv-60b2f6e99b9b pgpool[20070]: [115-2] LOCATION: pgpool_main.c:2329
Nov 21 20:59:15 srv-60b2f6e99b9b pgpool[20222]: [46-1] LOG: new connection received

supp_k

2016-11-24 00:33

reporter   ~0001188

Hi,

is it possible that postgres doesnt close connections correctly? Or probably it makes sense to wait and try read additional data from both backends? Please have a look at the following log information:


Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50039-1] LOG: DB node id: 0 backend pid: 17294 statement: Parse: update on_screen_notificat
ions set account_id=$1, context=$2, created=$3, details=$4, full_details=$5, initiator_name=$6, initiator_uuid=$7, ip_address=$8, link=$9, link_more=$10, link_more_text=$11, message=$12
, modified=$13, news_message_id=$14, package_id=$15, parent_uuid=$16, path=$17, service=$18, status=$19, type=$20, updated=$21, user_id=$22 where uuid=$23
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50039-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50052-1] LOG: DB node id: 0 backend pid: 17294 statement: B message
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50052-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50065-1] LOG: DB node id: 0 backend pid: 17294 statement: D message
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50065-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50075-1] LOG: DB node id: 0 backend pid: 17294 statement: Execute: update on_screen_notific
ations set account_id=$1, context=$2, created=$3, details=$4, full_details=$5, initiator_name=$6, initiator_uuid=$7, ip_address=$8, link=$9, link_more=$10, link_more_text=$11, message=$
12, modified=$13, news_message_id=$14, package_id=$15, parent_uuid=$16, path=$17, service=$18, status=$19, type=$20, updated=$21, user_id=$22 where uuid=$23
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50075-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50146-1] LOG: DB node id: 0 backend pid: 17294 statement: Parse: select distinct onscreenno
0_.uuid as uuid1_125_0_, newsmessag2_.msg_id as msg_id1_118_1_, onscreenno0_.account_id as account_2_125_0_, onscreenno0_.context as context3_125_0_, onscreenno0_.created as created4_12
5_0_, onscreenno0_.details as details5_125_0_, onscreenno0_.full_details as full_det6_125_0_, onscreenno0_.initiator_name as initiato7_125_0_, onscreenno0_.initiator_uuid as initiato8_1
25_0_, onscreenno0_.ip_address as ip_addre9_125_0_, onscreenno0_.link as link10_125_0_, onscreenno0_.link_more as link_mo11_125_0_, onscreenno0_.link_more_text as link_mo12_125_0_, onsc
reenno0_.message as message13_125_0_, onscreenno0_.modified as modifie14_125_0_, onscreenno0_.news_message_id as news_me22_125_0_, onscreenno0_.package_id as package15_125_0_, onscreenn
o0_.parent_uuid as parent_23_125_0_, onscreenno0_.path as path16_125_0_,
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50146-2] onscreenno0_.service as service17_125_0_, onscreenno0_.status as status18_125_0_,
onscreenno0_.type as type19_125_0_, onscreenno0_.updated as updated20_125_0_, onscreenno0_.user
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50146-3] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50159-1] LOG: DB node id: 0 backend pid: 17294 statement: B message
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50159-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50172-1] LOG: DB node id: 0 backend pid: 17294 statement: D message
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50172-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50182-1] LOG: DB node id: 0 backend pid: 17294 statement: Execute: select distinct onscreen
no0_.uuid as uuid1_125_0_, newsmessag2_.msg_id as msg_id1_118_1_, onscreenno0_.account_id as account_2_125_0_, onscreenno0_.context as context3_125_0_, onscreenno0_.created as created4_
125_0_, onscreenno0_.details as details5_125_0_, onscreenno0_.full_details as full_det6_125_0_, onscreenno0_.initiator_name as initiato7_125_0_, onscreenno0_.initiator_uuid as initiato8
_125_0_, onscreenno0_.ip_address as ip_addre9_125_0_, onscreenno0_.link as link10_125_0_, onscreenno0_.link_more as link_mo11_125_0_, onscreenno0_.link_more_text as link_mo12_125_0_, on
screenno0_.message as message13_125_0_, onscreenno0_.modified as modifie14_125_0_, onscreenno0_.news_message_id as news_me22_125_0_, onscreenno0_.package_id as package15_125_0_, onscree
nno0_.parent_uuid as parent_23_125_0_, onscreenno0_.path as path16_125_0_,
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50182-2] onscreenno0_.service as service17_125_0_, onscreenno0_.status as status18_125_0_,
onscreenno0_.type as type19_125_0_, onscreenno0_.updated as updated20_125_0_, onscreenno0_.us
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50182-3] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50258-1] LOG: DB node id: 0 backend pid: 17294 statement: B message
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50258-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50261-1] LOG: DB node id: 1 backend pid: 26559 statement: B message
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50261-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50275-1] LOG: DB node id: 0 backend pid: 17294 statement: Execute: COMMIT
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50275-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50279-1] LOG: DB node id: 1 backend pid: 26559 statement: Execute: COMMIT
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50279-2] LOCATION: pool_proto_modules.c:3084
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50292-1] WARNING: packet kind of backend 1 ['3'] does not match with master/majority nodes packet kind ['2']
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50292-2] LOCATION: pool_process_query.c:3603
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50293-1] FATAL: failed to read kind from backend
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50293-2] DETAIL: kind mismatch among backends. Possible last query was: "COMMIT" kind details are: 0[2] 1[3]
Nov 23 18:26:37 srv-60b2f6e99b9b lesk plesk 10.28.64.197(57642) idle in transaction[17043]: [50293-3] HINT: check data consistency among db nodes

t-ishii

2016-12-06 07:54

developer   ~0001200

Seems similar problem as 0000231. The same fix might for you. Please try attached patch.

t-ishii

2016-12-06 07:54

developer  

bug231.diff (4,210 bytes)   
bug231.diff (4,210 bytes)   

t-ishii

2017-02-23 17:06

developer   ~0001360

No feedback from the original reporter over 1 month. Issue closed.

Issue History

Date Modified Username Field Change
2016-11-11 17:54 serk New Issue
2016-11-11 17:58 serk Note Added: 0001168
2016-11-11 18:03 serk Note Added: 0001169
2016-11-20 04:48 serk Note Added: 0001177
2016-11-22 03:10 supp_k Note Added: 0001184
2016-11-24 00:33 supp_k Note Added: 0001188
2016-12-06 07:53 t-ishii Assigned To => t-ishii
2016-12-06 07:53 t-ishii Status new => assigned
2016-12-06 07:54 t-ishii Note Added: 0001200
2016-12-06 07:54 t-ishii File Added: bug231.diff
2016-12-06 07:54 t-ishii Status assigned => feedback
2017-02-23 17:06 t-ishii Note Added: 0001360
2017-02-23 17:06 t-ishii Status feedback => closed