[Pgpool-general] SOLVED - read_kind_from_backend: 1 th kind C does not match with master or majority connection kind D

Thu Dec 10 23:30:52 UTC 2009

Hi everyone,
  This is an update to my message of Mon 10/26/2009 which is included below. I worked with Tatsuo on this problem and my hats off to him for figuring it out. 

   The problem happened on my system when it was being stress tested but not when it had lighter loads. This was clue to the cause. In our stress test we were hitting the same narrow range of records in the database over and over as fast as we could. Sometimes the two would temporarily fall out of sync when one node finished a transaction before the other node. If we were unlucky during this very small window a query came in asking to performs a read on some of the information in flight, say a count of records. The results came back with the completed transaction on one DB and the incomplete one on the other. Of course a few nano seconds later the slower DB would catch up but by then it was too late as pgpool detected the mismatch. 

  This problem will rarely, if ever, happen in production because, at least for our system, users will very rarely working on the same record at the same time. This race condition is an acceptable risk in our situation and a failure merely results in a loss of redundancy which can be quickly regained with a pcp_recovery_node command.

Hope this helps someone else avoid spending too much on this problem and again thank you Tatsuo!
--Tim
+++++++++++++++++++ORGINAL MESSAGE++++++++++++++++++++
Hello,
 I am having troubles with a pgpool server in development and can't seem to find the cause after a few months of idle poking. Here is what I see;

We have two postgres 8.3 v. 8.3.7-0ubuntu8.04.1 servers (db01 and db02) running on VMs. They are configured nearly identically minus the different hostnames, IPs, etc. These are attached to one pgpool-II v.2.2.5 server (lb01). Pgpool is configured to replicate between them. Since we use nextval we have the system configured with load balancing off. All of this works fine under light load conditions, a few connections per second perhaps. Under high load however pgpool fairly quickly starts a failover due to  a kind mismatch error. Any attempts to recover the failed DB server fail with a backend error until he load goes back down. The master DB continues to carry traffic so there is no outage, just a loss of redundancy. I have seen this behavior about  a dozen times with the most current versions of pgpool at the time. 

In trying to track this problem I have set up central logging so I can see the pgpool traffic which causes postgres traffic. I have included a short except of this showing the kind mismatch and the traffic just before. All clocks on hosts are in sync so the time stamps can be trusted but order can a bit off.

Below I have my configuration file listed along with the loglines showing the error and DB traffic.
Am I doing something wrong? How do read the "read_kind_from_backend:" error line? I looks like there may be useful information in there but I don't know what to make of it like which is "C" and "D"?

Thanks for your time and help,
--Tim

+++++++++++++++++++BEGIN LOGLINES++++++++++++++++++++++
2009-10-26T15:30:19.082-04:00 db02 user.notice postgres[7583]: [33319-1] LOG:  execute <unnamed>: select code,country from i10_hts where party_serial_id=$1 order by party_serial_id,id
2009-10-26T15:30:19.082-04:00 db02 user.notice postgres[7583]: [33319-2] DETAIL:  parameters: $1 = \\\'983021\\\'
2009-10-26T15:30:19.084-04:00 db01 user.notice postgres[9003]: [33340-1] LOG:  execute <unnamed>: select code,country from i10_hts where party_serial_id=$1 order by party_serial_id,id
2009-10-26T15:30:19.084-04:00 db01 user.notice postgres[9003]: [33340-2] DETAIL:  parameters: $1 = \\\'983022\\\'
2009-10-26T15:30:19.084-04:00 db01 user.notice postgres[9217]: [8379-1] LOG:  execute <unnamed>: select surety_id,bond_holder,bond_activity,bond_type from i where id=$1
2009-10-26T15:30:19.084-04:00 db01 user.notice postgres[9217]: [8379-2] DETAIL:  parameters: $1 = \\\'472\\\'
2009-10-26T15:30:19.084-04:00 db02 user.notice postgres[7583]: [33320-1] LOG:  execute <unnamed>: select code,country from i10_hts where party_serial_id=$1 order by party_serial_id,id
2009-10-26T15:30:19.084-04:00 db02 user.notice postgres[7583]: [33320-2] DETAIL:  parameters: $1 = \\\'983022\\\'
2009-10-26T15:30:19.084-04:00 lb01 user.notice pgpool: 2009-10-26 15:30:19 ERROR: pid 9830: read_kind_from_backend: 1 th kind C does not match with master or majority connection kind D
2009-10-26T15:30:19.084-04:00 db02 user.notice postgres[7775]: [8369-1] LOG:  execute <unnamed>: select surety_id,bond_holder,bond_activity,bond_type from i where id=$1
2009-10-26T15:30:19.084-04:00 db02 user.notice postgres[7775]: [8369-2] DETAIL:  parameters: $1 = \\\'472\\\'
2009-10-26T15:30:19.084-04:00 lb01 user.notice pgpool: 2009-10-26 15:30:19 ERROR: pid 9830: kind mismatch among backends. Possible last query was: \\\"select code,country from i10_hts where party_serial_id=$1 order by party_serial_id,id\\\" kind details are: 0[D] 1[C]
2009-10-26T15:30:19.085-04:00 lb01 user.notice pgpool: 2009-10-26 15:30:19 LOG:   pid 9830: notice_backend_error: 1 fail over request from pid 9830
2009-10-26T15:30:19.153-04:00 lb01 user.notice pgpool: 2009-10-26 15:30:19 LOG:   pid 7366: starting degeneration. shutdown host db02(5432)
2009-10-26T15:30:19.153-04:00 lb01 user.notice pgpool: 2009-10-26 15:30:19 LOG:   pid 7366: failover_handler: set new master node: 0
2009-10-26T15:30:19.194-04:00 lb01 user.notice pgpool: 2009-10-26 15:30:19 LOG:   pid 7366: failover done. shutdown host db02(5432)
+++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++ BEGIN pgpool config++++++++++++++++++++++
listen_addresses = '*'
port = 5433
pcp_port = 9898
socket_dir = '/var/run/pgpool'
pcp_socket_dir = '/var/run/pgpool'
backend_socket_dir = '/var/run/pgpool'
pcp_timeout = 10
num_init_children = 32
max_pool = 4
child_life_time = 300
connection_life_time = 60
child_max_connections = 1000
authentication_timeout = 60
recovery_timeout = 120
client_idle_limit_in_recovery = 10
replication_mode = true
replication_strict = true
replication_timeout = 5000
load_balance_mode = false
replication_stop_on_mismatch = true
replicate_select = true
reset_query_list = 'ABORT; DISCARD ALL'
print_timestamp = true
master_slave_mode = false
connection_cache = true
health_check_timeout = 3
health_check_period = 10
health_check_user = 'postgres'
insert_lock = true
ignore_leading_white_space = true
log_statement = true
log_connections = true
log_hostname = false
parallel_mode = false
enable_query_cache = false
pgpool2_hostname = 'localhost'
backend_hostname0 = 'db01'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/8.3/main'
backend_hostname1 = 'db02'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/8.3/main'
recovery_user = 'postgres'
recovery_password = 'PASSWORD'
recovery_1st_stage_command = 'copy-base-backup'
recovery_2nd_stage_command = 'pgpool_recovery_pitr'
enable_pool_hba = false
+++++++++++++++++++++++++++++++++++++++++