View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000018 | Pgpool-II | Bug | public | 2012-07-11 04:35 | 2012-12-18 16:39 |
| Reporter | kvc | Assigned To | nagata | ||
| Priority | normal | Severity | major | Reproducibility | always |
| Status | resolved | Resolution | fixed | ||
| Platform | x86-64 | OS | RHEL | OS Version | 6.1 |
| Summary | 0000018: pgpool-II deadlock on extended queries | ||||
| Description | version is pgpool-II 3.1.3 Using epgsql (Erlang client implementation of PG's wire protocol) issuing an extended query of the form: "insert into spool (username, xml) values ($1, $2)" I'm finding that the query hangs and then times out. I captured a network trace of the same query done directly and via pgpool-II to compare protocol states and can clearly see that pgpool ends up issuing 2 flush statements ('H') in a row to the backend and then hangs on a read of the 2nd flush since the backend has already responded to the 1st flush and has nothing more to return. Results of tshark decoding the two scenarios along with pgpool log is provided below under "Additional Information" | ||||
| Steps To Reproduce | If you have Erlang/OTP (1) download and install epgsql from https://github.com/wg/epgsql (2) with EBIN set to wherever you installed epgsql BEAM files: $ erl -pa $EBIN Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false] Eshell V5.8.5 (abort with ^G) 1> {ok, C} = pgsql:connect("localhost", "ejabberd", "password", [{database, "ejabberd"}, {port, 15432}]). {ok,<0.34.0>} 2> pgsql:squery(C, "delete from spool"). {ok,0} 3> pgsql:equery(C, "insert into spool (username, xml) values ($1, $2)", ["somebody", "foobar"]). =ERROR REPORT==== 7-Jul-2012::19:39:00 === ** State machine <0.34.0> terminating ** Last event in was timeout ** When State == timeout ** Data == {state,undefined,<0.35.0>,5000, [{<<"application_name">>,<<>>}, {<<"client_encoding">>,<<"UTF8">>}, {<<"DateStyle">>,<<"ISO, MDY">>}, {<<"integer_datetimes">>,<<"on">>}, {<<"IntervalStyle">>,<<"postgres">>}, {<<"is_superuser">>,<<"off">>}, {<<"server_encoding">>,<<"UTF8">>}, {<<"server_version">>,<<"9.1.2">>}, {<<"session_authorization">>,<<"ejabberd">>}, {<<"standard_conforming_strings">>,<<"off">>}, {<<"TimeZone">>,<<"UTC">>}], undefined, {<0.32.0>,#Ref<0.0.0.42>}, undefined, {2342,822611433}, {statement,[],[],[text,text]}, 73} ** Reason for termination = ** timeout =ERROR REPORT==== 7-Jul-2012::19:39:00 === ** Generic server <0.35.0> terminating ** Last message in was {'EXIT',<0.34.0>,timeout} ** When Server state == {state,<0.34.0>,gen_tcp,#Port<0.603>,<<>>} ** Reason for termination == ** timeout ** exception exit: timeout 4> q(). ok | ||||
| Additional Information | tshark trace: -------------------------------------- NO pgpool (works): 11 16.969056 127.0.0.1 -> 127.0.0.1 PGSQL >P ***** EPGSQL REQUESTS A PARSE AND DESCRIBE ***** 12 16.969121 127.0.0.1 -> 127.0.0.1 PGSQL >D 13 16.969140 127.0.0.1 -> 127.0.0.1 PGSQL >H 14 16.969212 127.0.0.1 -> 127.0.0.1 TCP postgresql > 36045 [ACK] Seq=334 Ack=153 Win=32768 Len=0 TSV=1472324032 TSER=1472324032 15 16.973402 127.0.0.1 -> 127.0.0.1 PGSQL <1/t/n 16 16.973438 127.0.0.1 -> 127.0.0.1 TCP 36045 > postgresql [ACK] Seq=153 Ack=359 Win=33920 Len=0 TSV=1472324037 TSER=1472324037 17 16.974735 127.0.0.1 -> 127.0.0.1 PGSQL >B ***** EPGSQL FOLLOWS UP WITH BIND/EXECUTE/COMPLETE/SYNC 18 16.974769 127.0.0.1 -> 127.0.0.1 PGSQL >E 19 16.974784 127.0.0.1 -> 127.0.0.1 PGSQL >C 20 16.974797 127.0.0.1 -> 127.0.0.1 PGSQL >S 21 16.974828 127.0.0.1 -> 127.0.0.1 TCP postgresql > 36045 [ACK] Seq=359 Ack=213 Win=32768 Len=0 TSV=1472324038 TSER=1472324037 22 16.978919 127.0.0.1 -> 127.0.0.1 PGSQL <2/C/3/Z WITH pgpool (hangs): 67 77.791216 127.0.0.1 -> 127.0.0.1 PGSQL >P ***** EPGSQL AS ABOVE, BUT THIS TIME SENDS TO PGPOOL ***** 68 77.791255 127.0.0.1 -> 127.0.0.1 PGSQL >D 69 77.791277 127.0.0.1 -> 127.0.0.1 PGSQL >H 70 77.791370 127.0.0.1 -> 127.0.0.1 TCP 15432 > 53695 [ACK] Seq=334 Ack=153 Win=32768 Len=0 TSV=1472384854 TSER=1472384854 71 77.792786 127.0.0.1 -> 127.0.0.1 PGSQL >P/H ***** PGPOOL INSERTS A FLUSH ***** 72 77.794615 127.0.0.1 -> 127.0.0.1 PGSQL <1 73 77.794662 127.0.0.1 -> 127.0.0.1 TCP 35315 > postgresql [ACK] Seq=146 Ack=339 Win=33920 Len=0 TSV=1472384858 TSER=1472384858 74 77.794835 127.0.0.1 -> 127.0.0.1 PGSQL <1 75 77.794962 127.0.0.1 -> 127.0.0.1 PGSQL >D/H ***** PGPOOL INSERTS A FLUSH ***** 76 77.795037 127.0.0.1 -> 127.0.0.1 PGSQL <t/n 77 77.795162 127.0.0.1 -> 127.0.0.1 PGSQL <t 78 77.795200 127.0.0.1 -> 127.0.0.1 PGSQL <n 79 77.795244 127.0.0.1 -> 127.0.0.1 PGSQL >H ***** HANGS RIGHT HERE BECAUSE BACKEND NOW HAS NOTHING TO FLUSH ***** 80 77.795325 127.0.0.1 -> 127.0.0.1 TCP 53695 > 15432 [ACK] Seq=153 Ack=359 Win=32832 Len=0 TSV=1472384859 TSER=1472384858 81 77.795402 127.0.0.1 -> 127.0.0.1 PGSQL >B 82 77.795418 127.0.0.1 -> 127.0.0.1 PGSQL >E 83 77.795432 127.0.0.1 -> 127.0.0.1 PGSQL >C 84 77.795446 127.0.0.1 -> 127.0.0.1 PGSQL >S pgpool.log: -------------------------------------- 2012-07-07 05:55:53 DEBUG: pid 24178: ProcessFrontendResponse: kind from frontend P(50) 2012-07-07 05:55:53 DEBUG: pid 24178: pool_unset_doing_extended_query_message: done 2012-07-07 05:55:53 DEBUG: pid 24178: pool_set_doing_extended_query_message: done 2012-07-07 05:55:53 DEBUG: pid 24178: pool_set_query_in_progress: done 2012-07-07 05:55:53 DEBUG: pid 24178: Parse: statement name <> 2012-07-07 05:55:53 DEBUG: pid 24178: pool_set_query_in_progress: done 2012-07-07 05:55:53 LOG: pid 24178: DB node id: 0 backend pid: 24231 statement: Parse: insert into spool (username, xml) values ($1, $2) 2012-07-07 05:55:53 DEBUG: pid 24178: wait_for_query_response: waiting for backend 0 completing the query 2012-07-07 05:55:53 DEBUG: pid 24178: read_kind_from_backend: read kind from 0 th backend 1 NUM_BACKENDS: 1 2012-07-07 05:55:53 DEBUG: pid 24178: ProcessBackendResponse: kind from backend: 1 2012-07-07 05:55:53 DEBUG: pid 24178: pool_set_command_success: done 2012-07-07 05:55:53 DEBUG: pid 24178: pool_unset_query_in_progress: done 2012-07-07 05:55:53 DEBUG: pid 24178: ProcessFrontendResponse: kind from frontend D(44) 2012-07-07 05:55:53 DEBUG: pid 24178: pool_unset_doing_extended_query_message: done 2012-07-07 05:55:53 DEBUG: pid 24178: pool_set_doing_extended_query_message: done 2012-07-07 05:55:53 DEBUG: pid 24178: pool_set_query_in_progress: done 2012-07-07 05:55:53 DEBUG: pid 24178: Describe: waiting for master completing the query 2012-07-07 05:55:53 LOG: pid 24178: DB node id: 0 backend pid: 24231 statement: D message 2012-07-07 05:55:53 DEBUG: pid 24178: wait_for_query_response: waiting for backend 0 completing the query 2012-07-07 05:55:53 DEBUG: pid 24178: read_kind_from_backend: read kind from 0 th backend t NUM_BACKENDS: 1 2012-07-07 05:55:53 DEBUG: pid 24178: ProcessBackendResponse: kind from backend: t 2012-07-07 05:55:53 DEBUG: pid 24178: read_kind_from_backend: read kind from 0 th backend n NUM_BACKENDS: 1 2012-07-07 05:55:53 DEBUG: pid 24178: ProcessBackendResponse: kind from backend: n 2012-07-07 05:55:53 DEBUG: pid 24178: pool_unset_query_in_progress: done 2012-07-07 05:55:53 DEBUG: pid 24178: ProcessFrontendResponse: kind from frontend H(48) 2012-07-07 05:55:53 DEBUG: pid 24178: pool_unset_doing_extended_query_message: done 2012-07-07 05:55:58 DEBUG: pid 23924: I am 23924 accept fd 6 2012-07-07 05:55:58 LOG: pid 23924: connection received: host=127.0.0.1 port=36089 2012-07-07 05:55:58 DEBUG: pid 23924: Protocol Major: 1234 Minor: 5678 database: user: 2012-07-07 05:55:58 DEBUG: pid 23924: Cancel request received | ||||
| Tags | No tags attached. | ||||
|
|
This hang is fixed in V3.1.4 by flowing commit. http://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=8ffb7f63664ff9c7d4682ac58842abf187f0892d This fixes the bug that pgpool waits for primary node which does not have pending data. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2012-07-11 04:35 | kvc | New Issue | |
| 2012-11-26 15:30 | anzai | Assigned To | => nagata |
| 2012-11-26 15:30 | anzai | Status | new => assigned |
| 2012-12-18 16:38 | nagata | Note Added: 0000198 | |
| 2012-12-18 16:39 | nagata | Status | assigned => resolved |
| 2012-12-18 16:39 | nagata | Resolution | open => fixed |