View Issue Details

IDProjectCategoryView StatusLast Update
0000313Pgpool-IIBugpublic2017-06-09 11:04
ReporternagataAssigned Tonagata 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Product Version3.5.8 
Target Version3.5.9Fixed in Version 
Summary0000313: hang with extended protocol
DescriptionA hang occurs with streaming-replication mode and extended
protocol in a certain condition. The following is an example.

- pgpool.conf is configured so that all read queries are sent to the standby.
- First, issue a writing query in a transaction block
- After commiting the transaction, issue a select query.
- When processing the query, send Describe (statement) message just after Parse.

In practice, such Describe message is sent by JDBC when executing PreparedStatement
with a parameter of timestamp type more than prepare_threshold times. In addition,
queris can be issued outside of a transaction block when autocommit is enabled.

For details, see Steps To Reproduce.

[Analysis]

When a wiring query is issued in a transaction block, writing_transaction flag
is set, but after commiting the transaction this flag is not unset. The priblem
is that parse_before_bind() is always called when writing_transaction is set.

Alhough Parse of the select and the next Describe are sent to the standby
due to loadbalance, another Parse and the next Bind, Execute are sent to the primary
due to parse_before_bind. (In the following log, node 0 is primary, 1 is standby.)

 LOG: DB node id: 0 backend pid: 7079 statement: Parse: BEGIN
 LOG: DB node id: 1 backend pid: 7080 statement: Parse: BEGIN
 LOG: DB node id: 0 backend pid: 7079 statement: B message
 LOG: DB node id: 1 backend pid: 7080 statement: B message
 LOG: DB node id: 0 backend pid: 7079 statement: Execute: BEGIN
 LOG: DB node id: 1 backend pid: 7080 statement: Execute: BEGIN
 LOG: DB node id: 0 backend pid: 7079 statement: Parse: INSERT INTO test_tbl VALUES(1)
 LOG: DB node id: 0 backend pid: 7079 statement: B message
 LOG: DB node id: 0 backend pid: 7079 statement: Execute: INSERT INTO test_tbl VALUES(1)
 LOG: DB node id: 0 backend pid: 7079 statement: Parse: COMMIT
 LOG: DB node id: 1 backend pid: 7080 statement: Parse: COMMIT
 LOG: DB node id: 0 backend pid: 7079 statement: B message
 LOG: DB node id: 1 backend pid: 7080 statement: B message
 LOG: DB node id: 0 backend pid: 7079 statement: Execute: COMMIT
 LOG: DB node id: 1 backend pid: 7080 statement: Execute: COMMIT
 LOG: DB node id: 1 backend pid: 7080 statement: Parse: SELECT * FROM test_tbl
 LOG: DB node id: 1 backend pid: 7080 statement: D message
 LOG: DB node id: 0 backend pid: 7079 statement: C message
 LOG: DB node id: 0 backend pid: 7079 statement: Parse: SELECT * FROM test_tbl
 LOG: DB node id: 0 backend pid: 7079 statement: B message

In this situation, Pgpool-II waits for the responces of Describe from the
standby forever because any messages after Describe is not sent to the
standby and this backend desn't respond anyting.
(When I send Flush after Describe, the standby returns the response and
 this hang doesn't occuer.)

 (gdb) bt
 #0 0x00007f4bd6b93c33 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
 0000001 0x00000000004329e9 in pool_check_fd (cp=0x1b27f70) at protocol/pool_process_query.c:647
 0000002 0x0000000000472381 in pool_read (cp=0x1b27f70, buf=0x7ffc3e42632f, len=1) at utils/pool_stream.c:162
 0000003 0x0000000000438e6b in read_kind_from_one_backend (frontend=0x1b1d970, backend=0x1b1c910, kind=0x7ffc3e42668f "1", node=1)
     at protocol/pool_process_query.c:3074
 0000004 0x00000000004394be in read_kind_from_backend (frontend=0x1b1d970, backend=0x1b1c910, decided_kind=0x7ffc3e4268cf "")
     at protocol/pool_process_query.c:3209
 0000005 0x00000000004476f3 in ProcessBackendResponse (frontend=0x1b1d970, backend=0x1b1c910, state=0x7ffc3e42695c, num_fields=0x7ffc3e42696e)
     at protocol/pool_proto_modules.c:2538
 0000006 0x0000000000431a8e in pool_process_query (frontend=0x1b1d970, backend=0x1b1c910, reset_request=0) at protocol/pool_process_query.c:212
 0000007 0x000000000042c208 in do_child (fds=0x1b17ae0) at protocol/child.c:377
 0000008 0x0000000000406541 in fork_a_child (fds=0x1b17ae0, id=3) at main/pgpool_main.c:589
 0000009 0x0000000000405c44 in PgpoolMain (discard_status=1 '\001', clear_memcache_oidmaps=0 '\000') at main/pgpool_main.c:357
 0000010 0x0000000000404695 in main (argc=9, argv=0x7ffc3e42b3f8) at main/main.c:300

[patch]

The attached is a simple patch to fix it. In this fix,
parse_before_bind() is call only if we are in a transaction
block. Due to this, we can send Bind, Execute to the right
backend.
Steps To ReproduceUse pgproto
--------------------------------------------

'Q' "DROP TABLE IF EXISTS test_tbl"
'Q' "CREATE TABLE test_tbl(i int)"
'Q' "INSERT INTO test_tbl VALUES(1)"
'Y'
'Y'
'Y'

'P' "" "BEGIN" 0
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'

'P' "" "INSERT INTO hoge VALUES(1)" 0
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'

'P' "" "COMMIT" 0
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'

'P' "S_1" "SELECT * FROM hoge" 0
'D' 'S' "S_1"
'B' "C_1" "S_1" 0 0 0
'E' "C_1" 0
'S'
'Y'

'X'
---------------------------------------------
TagsNo tags attached.

Activities

nagata

2017-06-06 20:27

developer  

fix_query_hang.patch (640 bytes)
diff --git a/src/protocol/pool_proto_modules.c b/src/protocol/pool_proto_modules.c
index 93f309b..f85bda9 100644
--- a/src/protocol/pool_proto_modules.c
+++ b/src/protocol/pool_proto_modules.c
@@ -1326,7 +1326,8 @@ POOL_STATUS Bind(POOL_CONNECTION *frontend, POOL_CONNECTION_POOL *backend,
 	 * other than primary node. In this case, we send a parse message to the
 	 * primary node.
 	 */
-	if (pool_config->load_balance_mode && pool_is_writing_transaction())
+	if (pool_config->load_balance_mode && pool_is_writing_transaction() &&
+		TSTATE(backend, MASTER_SLAVE ? PRIMARY_NODE_ID : REAL_MASTER_NODE_ID) == 'T')
 	{
 		if (!STREAM)
 		{
fix_query_hang.patch (640 bytes)

t-ishii

2017-06-07 14:23

developer   ~0001535

The fix looks good to me. Feel free to commit it.

nagata

2017-06-09 11:04

developer   ~0001538

fixed

Issue History

Date Modified Username Field Change
2017-06-06 20:23 nagata New Issue
2017-06-06 20:27 nagata File Added: fix_query_hang.patch
2017-06-07 14:23 t-ishii Note Added: 0001535
2017-06-07 14:23 t-ishii Assigned To => nagata
2017-06-07 14:23 t-ishii Status new => assigned
2017-06-07 14:23 t-ishii Target Version => 3.5.9
2017-06-09 11:04 nagata Status assigned => resolved
2017-06-09 11:04 nagata Resolution open => fixed
2017-06-09 11:04 nagata Note Added: 0001538