View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000271 | Pgpool-II | Bug | public | 2016-12-16 04:46 | 2017-03-31 17:06 |
| Reporter | supp_k | Assigned To | t-ishii | ||
| Priority | urgent | Severity | major | Reproducibility | random |
| Status | closed | Resolution | open | ||
| OS | CentOS | OS Version | 6.x | ||
| Product Version | 3.5.4 | ||||
| Target Version | 3.6.3 | ||||
| Summary | 0000271: balanced requests after started transaction | ||||
| Description | We see that in transaction pgpool balances requests to standby read-only backend. But this should happen because slave doesn't contain required data until the transaction committed on slave. The bug is related to pgpool v3.6. | ||||
| Additional Information | We analysed the log records and did the conclusion: First 5 lines show the transaction was initiated and for some period pgpool routes requests to the backend #0 only which is master. But then we see the request (2016-12-15T12:46:13.229382+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79083-1]) was routed to standby backend 0000001. At this we moment we face data inconsistency in the frontend because for sure the backend 0000001 doesn't contain the data which was created in backend #0 until it is commited. The SELECT for the data is performed within the same transaction in which it was INSERT-ed. 2016-12-15T12:46:12.007225+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78172-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:12.007229+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78172-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:12.007240+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78175-1] LOG: DB node id: 1 backend pid: 2946 statement: B message 2016-12-15T12:46:12.007242+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78175-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:12.007446+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78189-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: BEGIN 2016-12-15T12:46:12.007455+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78189-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:12.007480+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78194-1] LOG: DB node id: 1 backend pid: 2946 statement: Execute: BEGIN 2016-12-15T12:46:12.007481+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78194-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:12.008519+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78260-1] LOG: DB node id: 1 backend pid: 2946 statement: Parse: SELECT 1 2016-12-15T12:46:12.008525+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78260-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:12.008653+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78274-1] LOG: DB node id: 1 backend pid: 2946 statement: B message 2016-12-15T12:46:12.008659+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78274-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:12.008823+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78287-1] LOG: DB node id: 1 backend pid: 2946 statement: D message 2016-12-15T12:46:12.008825+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78287-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:12.008906+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78297-1] LOG: DB node id: 1 backend pid: 2946 statement: Execute: SELECT 1 2016-12-15T12:46:12.008915+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle[16742]: [78297-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.218025+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78372-1] LOG: DB node id: 1 backend pid: 2946 statement: B message 2016-12-15T12:46:13.218034+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78372-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.218197+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78386-1] LOG: DB node id: 1 backend pid: 2946 statement: Execute: SELECT CASE WHEN sr.sub_limit = -1 THEN 1 ELSE 0 END AS is_inf, sr.sub_limit - sr.curr_usage AS res_avail, sr.rt_instance_id AS rt_instance_id FROM resource_classes rc JOIN resource_types rt ON (rc.class_id = rt.class_id) JOIN subs_resources sr ON (sr.rt_id = rt.rt_id) JOIN subscriptions s ON (sr.sub_id = s.sub_id) WHERE rc.name = $1 AND s.owner_id = $2 AND s.is_active = 1 ORDER BY is_inf DESC, res_avail DESC, sr.rt_instance_id 2016-12-15T12:46:13.218203+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78386-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.221080+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78446-1] LOG: DB node id: 0 backend pid: 17007 statement: Parse: INSERT INTO accounts_serial DEFAULT VALUES 2016-12-15T12:46:13.221083+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78446-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.221220+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78460-1] LOG: DB node id: 0 backend pid: 17007 statement: D message 2016-12-15T12:46:13.221223+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78460-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.222001+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78520-1] LOG: DB node id: 0 backend pid: 17007 statement: Parse: INSERT INTO accounts_serial DEFAULT VALUES 2016-12-15T12:46:13.222006+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78520-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.222146+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78534-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.222150+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78534-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.222272+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78547-1] LOG: DB node id: 0 backend pid: 17007 statement: D message 2016-12-15T12:46:13.222276+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78547-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.222351+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78557-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: INSERT INTO accounts_serial DEFAULT VALUES 2016-12-15T12:46:13.222355+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78557-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.223461+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78629-1] LOG: DB node id: 0 backend pid: 17007 statement: Parse: select currval('accounts_serial_seq') 2016-12-15T12:46:13.223465+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78629-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.223592+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78642-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.223594+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78642-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.223708+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78655-1] LOG: DB node id: 0 backend pid: 17007 statement: D message 2016-12-15T12:46:13.223711+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78655-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.223814+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78665-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: select currval('accounts_serial_seq') 2016-12-15T12:46:13.223822+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78665-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.224918+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78742-1] LOG: DB node id: 0 backend pid: 17007 statement: Parse: DELETE FROM accounts_serial WHERE account_id = $1 2016-12-15T12:46:13.224926+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78742-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.225052+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78755-1] LOG: DB node id: 0 backend pid: 17007 statement: D message 2016-12-15T12:46:13.225056+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78755-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.225842+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78815-1] LOG: DB node id: 0 backend pid: 17007 statement: Parse: DELETE FROM accounts_serial WHERE account_id = $1 2016-12-15T12:46:13.225846+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78815-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.225965+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78829-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.225973+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78829-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.226088+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78842-1] LOG: DB node id: 0 backend pid: 17007 statement: D message 2016-12-15T12:46:13.226092+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78842-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.226179+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78852-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: DELETE FROM accounts_serial WHERE account_id = $1 2016-12-15T12:46:13.226183+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78852-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.227219+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78921-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.227222+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78921-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.227380+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78935-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into addresses (address, address2, address3, city, country, postal_code, state) values ($1, $2, $3, $4, $5, $6, $7) 2016-12-15T12:46:13.227383+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78935-2] #011RETURNING * 2016-12-15T12:46:13.227386+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78935-3] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.228185+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78995-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.228188+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [78995-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.228343+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79009-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contacts (display_name) values ($1) 2016-12-15T12:46:13.228345+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79009-2] #011RETURNING * 2016-12-15T12:46:13.228348+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79009-3] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.229216+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79069-1] LOG: DB node id: 1 backend pid: 2946 statement: B message 2016-12-15T12:46:13.229220+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79069-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.229382+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79083-1] LOG: DB node id: 1 backend pid: 2946 statement: Execute: select account0_.account_id as account_1_3_0_, account0_.address_id as address21_3_0_, account0_.adm_contact_id as adm_con22_3_0_, account0_.aps_uuid as aps_uuid2_3_0_, account0_.ext_system_id as ext_sys23_3_0_, account0_.bill_contact_id as bill_co24_3_0_, account0_.brand_name as brand_na3_3_0_, account0_.ccp_version as ccp_vers4_3_0_, account0_.company_name as company_5_3_0_, account0_.customers_quantity as customer6_3_0_, account0_.l_country_code_default as l_countr7_3_0_, account0_.l_language_code_default as l_langua8_3_0_, account0_.l_variant_default as l_varian9_3_0_, account0_.ext_account_id as ext_acc10_3_0_, account0_.c_time as c_time11_3_0_, account0_.l_country_code as l_count12_3_0_, account0_.l_language_code as l_langu13_3_0_, account0_.l_variant as l_varia14_3_0_, account0_.is_locked as is_lock15_3_0_, account0_.note 2016-12-15T12:46:13.229388+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79083-2] as note16_3_0_, account0_.owner_id as owner_i25_3_0_, account0_.path as path17_3_0_, account0_.is_personal as is_pers18_3_0_, account0_.rt_instance_id as rt_inst19_3_0_, acc 2016-12-15T12:46:13.229390+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79083-3] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.230519+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79133-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.230524+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79133-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.230658+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79147-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.230665+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79147-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.231363+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79199-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.231371+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79199-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.231498+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79213-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.231502+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79213-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.232170+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79265-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.232174+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79265-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.232311+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79279-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.232314+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79279-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.232971+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79331-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.232974+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79331-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.233138+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79345-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.233146+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79345-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.233854+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79397-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.233856+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79397-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.233973+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79411-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.233978+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79411-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.234643+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79463-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.234647+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79463-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.234790+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79477-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.234799+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79477-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.235426+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79529-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.235430+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79529-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.235583+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79543-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.235587+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79543-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.236258+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79595-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.236266+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79595-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.236424+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79609-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.236429+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79609-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.237049+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79661-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.237054+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79661-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.237188+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79675-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.237191+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79675-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.237771+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79727-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.237782+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79727-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.237907+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79741-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.237915+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79741-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.238572+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79793-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.238575+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79793-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.238715+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79807-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.238719+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79807-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.239370+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79859-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.239373+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79859-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.239513+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79873-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into accounts (address_id, adm_contact_id, aps_uuid, ext_system_id, bill_contact_id, brand_name, ccp_version, company_name, customers_quantity, l_country_code_default, l_language_code_default, l_variant_default, ext_account_id, c_time, l_country_code, l_language_code, l_variant, is_locked, note, owner_id, path, is_personal, rt_instance_id, tech_contact_id, account_type, account_id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26) 2016-12-15T12:46:13.239518+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79873-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.240744+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79925-1] LOG: DB node id: 1 backend pid: 2946 statement: B message 2016-12-15T12:46:13.240748+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79925-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.240901+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79939-1] LOG: DB node id: 1 backend pid: 2946 statement: Execute: select user0_.user_id as user_id1_193_0_, identity1_.identity_id as identity1_102_1_, account2_.account_id as account_1_3_2_, user0_.scope_id as scope_i16_193_0_, user0_.aps_uuid as aps_uuid2_193_0_, user0_.contact_id as contact17_193_0_, user0_.date_format as date_for3_193_0_, user0_.deleting as deleting4_193_0_, user0_.dns_path_of_domain_part as dns_path5_193_0_, user0_.domain_id as domain_i6_193_0_, user0_.is_enabled as is_enabl7_193_0_, user0_.auth_identity_id as auth_id18_193_0_, user0_.l_country_code as l_countr8_193_0_, user0_.l_language_code as l_langua9_193_0_, user0_.l_variant as l_varia10_193_0_, user0_.locked_by as locked_11_193_0_, user0_.member_id as member_12_193_0_, user0_.sub_id as sub_id13_193_0_, user0_.timezone as timezon14_193_0_, user0_.type as type15_193_0_, identity1_.system_id as system_i4_102_1_, 2016-12-15T12:46:13.240906+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79939-2] identity1_.login as login2_102_1_, identity1_.login_lowered as login_lo3_102_1_, account2_.address_id as address21_3_2_, account2_.adm_contact_id as adm_con22_3_2_, account2_.aps_u 2016-12-15T12:46:13.240908+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79939-3] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.241819+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79997-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.241827+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79997-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.241969+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80011-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: select localident0_.identity_id as identity1_109_0_, localident0_.pwd_expired as pwd_expi2_109_0_, localident0_.pwd_ctime as pwd_ctim3_109_0_, localident0_.pwd_hash as pwd_hash4_109_0_, localident0_.pwd_type as pwd_type5_109_0_ from local_identities localident0_ where localident0_.identity_id=$1 2016-12-15T12:46:13.241977+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80011-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.242963+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80069-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.242971+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80069-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.243102+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80083-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: select role0_.role_id as role_id1_160_, role0_.account_id as account_6_160_, role0_.description as descript2_160_, role0_.name as name3_160_, role0_.admin as admin4_160_, role0_.role_type as role_typ5_160_ from roles role0_ where role0_.account_id=$1 and role0_.name=$2 and (role0_.role_type=$3 or role0_.role_type=$4) 2016-12-15T12:46:13.243106+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80083-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.243851+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80133-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.243853+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80133-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.244015+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80147-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into roles (account_id, description, name, admin, role_type) values ($1, $2, $3, $4, $5) 2016-12-15T12:46:13.244024+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80147-2] #011RETURNING * 2016-12-15T12:46:13.244028+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80147-3] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.245087+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80207-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.245091+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80207-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.245241+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [80221-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: select privilege0_.privilege_id as privileg1_144_, privilege0_.allow_locked as allow_lo2_144_, privilege0_.area as area3_144_, privilege0_.name as name4_144_, privilege0_1_.group_id as group_id1_143_ from privileges privilege0_ left outer join privilege_groups_content privilege0_1_ on privilege0_.privilege_id=privilege0_1_.privilege_id where privilege0_.name=$1 | ||||
| Tags | No tags attached. | ||||
|
|
One more strange case: if we do pcp_detach_node for the SLAVE backend and then pcp_attach_node then the problem of "incorrect balancing" disappeares. |
|
|
Please pay attention to the following information. In the following piece of code the requests are coming with interval less then ONE millisecond. It is possible that the 1st request is not complete yet. Does it make sense? Probably pgpool has not created the "flag" that balancing is prohibited after this moment? 2016-12-15T12:46:13.238715+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79807-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into contact_properties (value, contact_id, name) values ($1, $2, $3) 2016-12-15T12:46:13.238719+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79807-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.239370+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79859-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.239373+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79859-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.239513+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79873-1] LOG: DB node id: 0 backend pid: 17007 statement: Execute: insert into accounts (address_id, adm_contact_id, aps_uuid, ext_system_id, bill_contact_id, brand_name, ccp_version, company_name, customers_quantity, l_country_code_default, l_language_code_default, l_variant_default, ext_account_id, c_time, l_country_code, l_language_code, l_variant, is_locked, note, owner_id, path, is_personal, rt_instance_id, tech_contact_id, account_type, account_id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26) 2016-12-15T12:46:13.239518+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79873-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.240744+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79925-1] LOG: DB node id: 1 backend pid: 2946 statement: B message 2016-12-15T12:46:13.240748+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79925-2] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.240901+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79939-1] LOG: DB node id: 1 backend pid: 2946 statement: Execute: select user0_.user_id as user_id1_193_0_, identity1_.identity_id as identity1_102_1_, account2_.account_id as account_1_3_2_, user0_.scope_id as scope_i16_193_0_, user0_.aps_uuid as aps_uuid2_193_0_, user0_.contact_id as contact17_193_0_, user0_.date_format as date_for3_193_0_, user0_.deleting as deleting4_193_0_, user0_.dns_path_of_domain_part as dns_path5_193_0_, user0_.domain_id as domain_i6_193_0_, user0_.is_enabled as is_enabl7_193_0_, user0_.auth_identity_id as auth_id18_193_0_, user0_.l_country_code as l_countr8_193_0_, user0_.l_language_code as l_langua9_193_0_, user0_.l_variant as l_varia10_193_0_, user0_.locked_by as locked_11_193_0_, user0_.member_id as member_12_193_0_, user0_.sub_id as sub_id13_193_0_, user0_.timezone as timezon14_193_0_, user0_.type as type15_193_0_, identity1_.system_id as system_i4_102_1_, 2016-12-15T12:46:13.240906+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79939-2] identity1_.login as login2_102_1_, identity1_.login_lowered as login_lo3_102_1_, account2_.address_id as address21_3_2_, account2_.adm_contact_id as adm_con22_3_2_, account2_.aps_u 2016-12-15T12:46:13.240908+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79939-3] LOCATION: pool_proto_modules.c:3085 2016-12-15T12:46:13.241819+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79997-1] LOG: DB node id: 0 backend pid: 17007 statement: B message 2016-12-15T12:46:13.241827+03:00 srv-60b2f6e99b9b lesk plesk 192.168.102.240(59893) idle in transaction[16742]: [79997-2] LOCATION: pool_proto_modules.c:3085 |
|
|
Hi Pgpool developers! Please consider the logic that reproduces the error. In approximately 30% of cases the attached class (TestPgpool.java) reproduces the issue. The target testing DB has 2 tables used for testing (tables.ddl). We did several modifications in pgpool code that show the cases (as we could detect it) when Pgpool does balancing instead of forwarding the SELECT request to MASTER backend after DDL statement was processed. In several runnings one will see messages: "USE BACKEND: {1} THOUGH THE WRITE TRANSACTION IS ACTIVE!!!!!!!!" which say that previously the UPDATE statement was executed but the next SELECT method is forwarded to SLAVE – which should not happen. We enforced balancing ratio in pgpool by proportion MASTER/SLAVE as 0/1000. Thus the majority of requests will be forwarded to SLAVE. The attached configuration assumes the 2 backends are synchronous and backend #0 is master Notes: The attached DIFF file also contains patch for the issue 0000231. |
|
|
It seems there's no evidence that the SELECT in question was performed in an explicit transaction in your log. |
|
|
Please have a look at the attached log file. The issue is reproduced when time lag between requests is very small. This is why we create attached java multithread java test. Regarding the attached log: 1) at 2016-12-20T12:02:38.200806 there was a write statement 2) Later (2016-12-20T12:02:38.202535) one can see that the new SELECT request was balanced to the node 1 which is read-only slave. |
|
|
Multuthread java test: import java.sql.*; import java.util.*; import java.util.concurrent.CyclicBarrier; import java.util.concurrent.locks.*; public class TestPgpoolLoad { private static final int TEST_THREADS_NUM = 40; public static void main(String[] args) throws Exception { try (Connection conn = DriverManager.getConnection("jdbc:postgresql://10.28.78.103:5432/postgres", "postgres", "password")) { Lock lock = new ReentrantLock(); conn.setAutoCommit(false); CyclicBarrier cb = new CyclicBarrier(TEST_THREADS_NUM); try (PreparedStatement stmtSelect = conn.prepareStatement("select * from read_table where id <= ?"); PreparedStatement stmtWrite = conn.prepareStatement("update write_table set id = 1 where id = ?")) { List<Thread> thrs = new ArrayList<>(); for (int i = 0; i < TEST_THREADS_NUM; i++) { boolean isRead = i < TEST_THREADS_NUM - 1; boolean isSleep = i < (TEST_THREADS_NUM / 2); thrs.add(new TestReader(isSleep, cb, isRead, isRead ? stmtSelect : stmtWrite, lock)); } for (Thread thr : thrs) { thr.start(); } Thread.sleep(1000); for (Thread thr : thrs) { thr.join(); } System.out.println("ok"); } catch (Exception ex) { ex.printStackTrace(); throw ex; } conn.commit(); } } public static class TestReader extends Thread { private final CyclicBarrier cb; private final boolean readThread; private final PreparedStatement stmt; private final Lock lock; private final boolean isSleep; private final boolean isRead; TestReader(boolean isSleep, CyclicBarrier cb, boolean readThread, PreparedStatement readStmt, Lock lock) { this.isSleep = isSleep; this.cb = cb; this.isRead = readThread; this.readThread = readThread; this.stmt = readStmt; this.lock = lock; } @Override public void run() { try { cb.await(); if (!isRead) { Thread.sleep(1); } lock.lock(); stmt.setInt(1, 1); stmt.execute(); } catch (Exception ex) { ex.printStackTrace(); } finally { lock.unlock(); } } } } |
|
|
Also there is the transaction start mentioned in the log: 2016-12-20T12:02:38.032665+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [208-1] 2016-12-20 12:02:38: pid 30813:LOG: DB node id: 0 backend pid: 31091 statement: Parse: BEGIN 2016-12-20T12:02:38.032667+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [208-2] 2016-12-20 12:02:38: pid 30813:LOCATION: pool_proto_modules.c:3089 2016-12-20T12:02:38.032800+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [211-1] 2016-12-20 12:02:38: pid 30813:LOG: DB node id: 1 backend pid: 13293 statement: Parse: BEGIN 2016-12-20T12:02:38.032806+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [211-2] 2016-12-20 12:02:38: pid 30813:LOCATION: pool_proto_modules.c:3089 2016-12-20T12:02:38.033030+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [224-1] 2016-12-20 12:02:38: pid 30813:LOG: DB node id: 0 backend pid: 31091 statement: B message 2016-12-20T12:02:38.033032+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [224-2] 2016-12-20 12:02:38: pid 30813:LOCATION: pool_proto_modules.c:3089 2016-12-20T12:02:38.033178+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [227-1] 2016-12-20 12:02:38: pid 30813:LOG: DB node id: 1 backend pid: 13293 statement: B message 2016-12-20T12:02:38.033179+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [227-2] 2016-12-20 12:02:38: pid 30813:LOCATION: pool_proto_modules.c:3089 2016-12-20T12:02:38.033435+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [239-1] 2016-12-20 12:02:38: pid 30813:LOG: DB node id: 0 backend pid: 31091 statement: Execute: BEGIN 2016-12-20T12:02:38.033437+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [239-2] 2016-12-20 12:02:38: pid 30813:LOCATION: pool_proto_modules.c:3089 2016-12-20T12:02:38.033577+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [244-1] 2016-12-20 12:02:38: pid 30813:LOG: DB node id: 1 backend pid: 13293 statement: Execute: BEGIN 2016-12-20T12:02:38.033578+03:00 srv-60b2f6e99b9b lesk plesk 10.192.36.35(43669) idle[30813]: [244-2] 2016-12-20 12:02:38: pid 30813:LOCATION: pool_proto_modules.c:3089 |
|
|
But the process id is different (30813 vs. 16742). So I can not confirm that the SELECT and DML are in a same transaction. |
|
|
It seems you missed the freshest attached log file. Please have a look at the freshest "test7.log". Also the pgpool settings are: backend_hostname0 = 'a.db.node' backend_port0 = 15432 backend_weight0 = 0 backend_data_directory0 = '/var/lib/pgsql/9.5/data' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = 'b.db.node' backend_port1 = 15432 backend_weight1 = 1000 backend_data_directory1 = '/var/lib/pgsql/9.5/data' backend_flag1 = 'ALLOW_TO_FAILOVER' enable_pool_hba = on pool_passwd = 'pool_passwd' authentication_timeout = 60 ssl = off num_init_children = 240 max_pool = 1 child_life_time = 0 child_max_connections = 1 connection_life_time = 0 client_idle_limit = 0 log_destination = 'syslog' log_line_prefix = '%t: pid %p:' log_connections = on log_hostname = on log_statement = on log_per_node_statement = on log_standby_delay = 'none' syslog_facility = 'LOCAL0' syslog_ident = 'pgpool' debug_level = 5 log_error_verbosity = verbose client_min_messages = warning log_min_messages = debug1 |
|
|
I have updated the test scenario. Please look at the source code below. There are N-1 threads that read Value which was previously inserted by the Single thread. The N threads start simultaneously in front of the barrier. There is a semaphore (ReentrantLock) that prevents logic from parallel queries but we still have very high concurrency mode and minimal time period between receiving response and sending of the next query. After the INSERT threads completes insert operation it sets valatile flag (writePerformed) and all SELECT threads start trying to fetch the value. If the value was not returned then it definely means the SELECT was balanced to the node which doesn't contain the value inserted so far. Try to launch the test several times and you will see the exception in stdout. In our environment we reproduce the issue in 50%. If you wish I can create fully functional test case: debug it in the local QA environment and provide you so that you will be able to use it in your CI solution. package com; import java.sql.*; import java.util.*; import java.util.concurrent.CyclicBarrier; import java.util.concurrent.locks.*; public class TestPgpool { private static final int TEST_THREADS_NUM = 40; static volatile boolean writePerformed = false; // variable will be set into TRUE when the write thread did INSERT operation public static void main(String[] args) throws Exception { Random random = new Random(System.currentTimeMillis()); try (Connection conn = DriverManager.getConnection("jdbc:postgresql://10.28.78.103/plesk", "plesk", "jjHCF7sGHq")) { Lock lock = new ReentrantLock(); conn.setAutoCommit(false); // here we define explicit transaction CyclicBarrier cb = new CyclicBarrier(TEST_THREADS_NUM); final int valueTest = random.nextInt(); System.out.println("test value:" + valueTest); try (PreparedStatement stmtSelect = conn.prepareStatement("select * from write_table where id = ?"); PreparedStatement stmtWrite = conn.prepareStatement("insert into write_table(id) values(?)")) { List<Thread> thrs = new ArrayList<>(); for (int i = 0; i < TEST_THREADS_NUM; i++) { boolean isRead = i > 0; //boolean isSleep = i < (TEST_THREADS_NUM / 2); thrs.add(new TestReader(valueTest, cb, isRead, isRead ? stmtSelect : stmtWrite, lock)); } for (Thread thr : thrs) { thr.start(); } Thread.sleep(1000); for (Thread thr : thrs) { thr.join(); } } catch (Exception ex) { ex.printStackTrace(); throw ex; } conn.commit(); } } public static class TestReader extends Thread { private final CyclicBarrier cb; private final PreparedStatement stmt; private final Lock lock; private final boolean isRead; private final int valueTest; TestReader(int valueTest, CyclicBarrier cb, boolean readThread, PreparedStatement stmt, Lock lock) { this.valueTest = valueTest; this.cb = cb; this.isRead = readThread; this.stmt = stmt; this.lock = lock; } @Override public void run() { try { cb.await(); lock.lock(); stmt.setInt(1, valueTest); if (isRead) { try (ResultSet rs = stmt.executeQuery()) { if (writePerformed) { if (!rs.next()) { throw new RuntimeException("The request was balanced. The value was inserted but the the following select returned nothing"); } else { int val = rs.getInt(1); System.out.println("recevied value:" + val); } } else { // write was not performed so one should not expect the value is in the database } } } else { stmt.execute(); writePerformed = true; // set mark the write performed. since this moment the READ threads will be trying to read the value. } } catch (Exception ex) { ex.printStackTrace(); throw new RuntimeException(ex); } finally { lock.unlock(); } } } } |
|
|
I have took a look at test7.log and confirmed the issue. Looking into the source code now... |
|
|
Is it possible to provide a Java debug trace when the issue occurs? Pgpool-II decides the transaction is writing when a write query is sent and backend returns the "ready for query" message to Pgpool-II. My guess is, client sends parse message for a select *before* Pgpool-II receives the "ready for query message". If it actually happens, the select can be sent to standby because the writing flag is not set yet. |
|
|
Yes, please have a look at the attached test10.zip file. It contains pgpool log and the corresponding java/jdbc log. |
|
|
'Probably it makes sense to set the flag on the receiving of a WRITE query. In this case the pgpool's state machine is free of the problem and even the possibility (in this particular case only). |
|
|
It's not that simple. By looking at the jdbc log, I found that: 1) "prepare select" is was sent to node 1 2) write query was sent to node 0 3) bind/execute request for 1) was coming. They were sent to node 1 (oops!) If we simply redirect 3) to node 0, it will fail because there's no prepared statement for the select on node 0. So the only solution is sending prepare select to node 0 before 3). Actually the code for the logic was in 3.4. I seem to forgot to port it to 3.5 or later:-) |
|
|
Still I don't understand the real cause of the problem because I'm not an expert in Pgpool's state machine and Postgres protocol. But from my point of view it seems the Parse message should be sent to all backends if load balancing is activated. This is True becase if for instance there are no DML requests Pgpool still has this right to balance requests. Right now from the logs I see the Parse is forwarded to the node 1 (from the test10.log) and later after the INSERT is complete Pgpool still forwards BIND/EXECUTE for the Selects to the node 1..... this is what I see. But I see you already have the solution which was forgotten in the past ))) It seems the lost code should be restored ))) |
|
|
> But from my point of view it seems the Parse message should be sent to all backends if load balancing is activated. Actually we did this long time ago and people blamed us because that way was ineffective and slow. |
|
|
Ok, probably there are some reasons exist. But we still need to fix the problem. |
|
|
Would it be possible to restore the lost code? We still facing problems because of the issue ((( |
|
|
Hi Guys, recently under not very high load we faced exactly the same issue. It resulted in data loses and integrity problems. The program logic was inserting data but than occasionally it failed to fetch the same data within the same transaction. First moment we decided that postgres violates the ACID concept, but later revealed problem in pgpool's behaviour ((( I see there is the fix for the problem exists. Can you please apply it please ASAP? We look forward to obtain the fixed build. |
|
|
> Would it be possible to restore the lost code? Of course. But it would not be straight forward because of the message handling logic difference between Pgpool-II 3.5 and pre 3.5. Also I need a tool to send arbitrary message stream for testing/debugging. I'm working on developing it... |
|
|
Do you have any estimations when you will be able to provide the patch for testing? We look forward to obtain it. |
|
|
From December 29 to January 3rd my office was closed because of national holidays. I would like to finish the work by the end of this month. |
|
|
Happy new year and Merry Christmas Mr. T-Ishii! We look forward to obtain the fixed version. Lots of our activities are delayed because of this problem. If possible we'd like to participate in testing as soon as you are ready to release the fix. Probably if we participate you will be able to accomplish with the fix faster than the end of the month with a better quality. |
|
|
Yes, is it possible to provide the fix earlier than the end of the month? Recently we faced several more issues of this type. Every time it takes much efforts to rollback the business logic and restore data ((( We are also ready to participate in the testing. PLEASE if possible provide the fix faster! |
|
|
Dear Tatsui Ishii, do you have any information about estimated timelines when you will be able to provide the fix? We are experiencing a lot of troubles because of the issue ((( Please is possible provide the fix or the information wehn it will be available. Thank you! |
|
|
...I'm sorry for the mistake in your name!! |
|
|
Any updates on the issue? From the message thread I see there is a fundamental problem in pgpool((( The attached by supp_k test shows that the problem easily reproduced when the time lag between response from pgpool and the next request is minimal. Also the PARSE message is being sent to one node only. Just wonder how do you guys are hoing to fix the trouble if you dont want to send the parse request to all backends? |
|
|
We don't need to send PARSE message to all backends. If we know that a write query was sent in the transaction, then succeeding SELECT is sent to the primary node. If we already send a parse message for the SELECT to other than the primary, then we will send the parse message to the primary. I'm working on the implementation in this direction now. |
|
|
Hi T-Ishii, are there any updates about this problem? We are ready to start testing it. |
|
|
Guys, any updates on the issue? Can you please provide the fix? |
|
|
Still working on it. However I'm pretty busy now and I'm not sure when the work will finish. |
|
|
I'm getting a slight progress. With over 800 lines diff, now in certain case the issue seems to be fixed. Here is the case described by pgproto (https://github.com/tatsuo-ishii/pgproto). However I am experiencing a regression with our regression test suit, and definitely I need to work on it. # Test data for bug271. # In an explicit transaction, SELECT is parsed, DML issued, same SELECT/bind issued. # In this case Pgpool-II should parse resend to primary node. # # Create test table 'Q' "DROP TABLE IF EXISTS pgproto_test1" 'Y' 'Q' "CREATE TABLE pgproto_test1(i INT)" 'Y' # Start a transaction 'P' "S1" "BEGIN" 0 'B' "" "S1" 0 0 0 'E' "" 0 'C' 'S' "S1" # Issue SELECT 'P' "S2" "SELECT 1" 0 #'B' "" "S1" 0 0 0 #'E' "" 0 #'C' 'S' "S1" # Issue INSERT 'P' "S1" "INSERT INTO pgproto_test1 VALUES(1)" 0 'B' "" "S1" 0 0 0 'E' "" 0 'C' 'S' "S1" #'S' #'Y' # Issue SELECT. This should be sent to primary node. #'P' "S1" "SELECT 1" 0 'B' "" "S2" 0 0 0 'E' "" 0 'C' 'S' "S2" # Issue COMMIT 'P' "S1" "COMMIT" 0 'B' "" "S1" 0 0 0 'E' "" 0 'C' 'S' "S1" 'S' 'Y' 'X' |
|
|
I finally managed to create the initial cut of the patch I was talking about. The patch is relatively large (1652 lines of diff) and I would appreciate if you guys test it out. The patch is attached as "bug271.diff". It was originally created against master branch, also can be applied to 3.6 and 3.5 stable trees as well. |
|
|
|
|
|
The discussion has been continued at pgpool-hackers mailing list (pgpool-hackers: 2043]. |
|
|
The issue was reproduced by us in Pgpool 3.6.1 several times in production environment. Seems the bug is very problem since it is not closed for several months. This fact is very painful ((( Are there any estimations for the fix? |
|
|
See the mailing archive discussions. I am working with Sergey, and he kindly have been testing my patches. I cannot say the estimated date for the fix but I believe we are getting closer to the end. |
|
|
Now that fixes are incorporated into the stable branch and discussions/activities have been moved to the mailing list, I'm going to close this item. (I asked Sergey and he agreed on it). |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2016-12-16 04:46 | supp_k | New Issue | |
| 2016-12-16 04:59 | supp_k | Note Added: 0001225 | |
| 2016-12-17 02:33 | supp_k | Note Added: 0001226 | |
| 2016-12-19 06:10 | serk | File Added: issues_271_244.zip | |
| 2016-12-19 06:10 | serk | Note Added: 0001231 | |
| 2016-12-20 09:35 | t-ishii | Assigned To | => t-ishii |
| 2016-12-20 09:35 | t-ishii | Status | new => assigned |
| 2016-12-20 17:47 | t-ishii | Note Added: 0001242 | |
| 2016-12-20 17:47 | t-ishii | Status | assigned => feedback |
| 2016-12-20 18:07 | supp_k | File Added: test7.log | |
| 2016-12-20 18:07 | supp_k | Note Added: 0001243 | |
| 2016-12-20 18:07 | supp_k | Status | feedback => assigned |
| 2016-12-20 18:08 | supp_k | Note Added: 0001244 | |
| 2016-12-20 18:11 | supp_k | Note Added: 0001245 | |
| 2016-12-20 18:15 | t-ishii | Note Added: 0001247 | |
| 2016-12-20 18:24 | supp_k | Note Added: 0001248 | |
| 2016-12-21 05:42 | supp_k | Note Added: 0001250 | |
| 2016-12-21 13:39 | t-ishii | Note Added: 0001254 | |
| 2016-12-21 14:24 | t-ishii | Note Added: 0001255 | |
| 2016-12-21 14:25 | t-ishii | Status | assigned => feedback |
| 2016-12-21 15:18 | supp_k | File Added: test10.zip | |
| 2016-12-21 15:18 | supp_k | Note Added: 0001256 | |
| 2016-12-21 15:18 | supp_k | Status | feedback => assigned |
| 2016-12-21 18:11 | supp_k | Note Added: 0001257 | |
| 2016-12-21 21:35 | t-ishii | Note Added: 0001258 | |
| 2016-12-21 21:54 | supp_k | Note Added: 0001259 | |
| 2016-12-21 23:56 | t-ishii | Note Added: 0001260 | |
| 2016-12-22 00:04 | supp_k | Note Added: 0001261 | |
| 2016-12-22 18:24 | supp_k | Note Added: 0001262 | |
| 2016-12-23 20:57 | elepuser | Note Added: 0001263 | |
| 2016-12-26 15:06 | t-ishii | Note Added: 0001264 | |
| 2016-12-28 20:01 | supp_k | Note Added: 0001271 | |
| 2017-01-04 18:06 | t-ishii | Note Added: 0001277 | |
| 2017-01-04 19:17 | supp_k | Note Added: 0001278 | |
| 2017-01-10 05:15 | elepuser | Note Added: 0001286 | |
| 2017-01-25 19:21 | supp_k | Note Added: 0001295 | |
| 2017-01-25 19:22 | supp_k | Note Added: 0001296 | |
| 2017-01-29 07:35 | elepuser | Note Added: 0001310 | |
| 2017-01-30 18:05 | t-ishii | Note Added: 0001312 | |
| 2017-02-06 16:40 | supp_k | Note Added: 0001329 | |
| 2017-02-07 17:48 | elepuser | Note Added: 0001336 | |
| 2017-02-08 10:27 | t-ishii | Note Added: 0001337 | |
| 2017-02-14 16:12 | t-ishii | Note Added: 0001341 | |
| 2017-02-23 15:20 | t-ishii | Note Added: 0001353 | |
| 2017-02-23 15:20 | t-ishii | File Added: bug271.diff | |
| 2017-02-23 15:21 | t-ishii | Status | assigned => feedback |
| 2017-02-27 18:10 | t-ishii | Note Added: 0001366 | |
| 2017-03-17 07:37 | enruquekl | Note Added: 0001378 | |
| 2017-03-17 14:22 | t-ishii | Note Added: 0001379 | |
| 2017-03-31 17:05 | t-ishii | Note Added: 0001402 | |
| 2017-03-31 17:06 | t-ishii | Status | feedback => closed |
| 2017-03-31 17:06 | t-ishii | Target Version | => 3.6.3 |