View Issue Details

IDProjectCategoryView StatusLast Update
0000432Pgpool-IIBugpublic2018-10-17 09:43
ReporternagataAssigned Tot-ishii 
PrioritynormalSeverityminorReproducibilitysometimes
Status resolvedResolutionopen 
Product Version3.7.5 
Target Version3.7.6Fixed in Version 
Summary0000432: Received resultset tuples, but no field structure for them exception in JDBC
DescriptionOne of the our clients reported that the following exception occurred with JDBC (42.2.1.jre6).

 java.lang.IllegalStateException: Received resultset tuples, but no field structure for them


From the anlalysis of JDBC, this is because it received the responses for SELECT statements while it expected the response for BEGIN. Actually, any responses of BEGIN were not returned to the client as the following trace log of JDBC.


 FE=> CloseStatement(S_6)
 FE=> CloseStatement(S_5)
 FE=> Parse(stmt=null,query="BEGIN",oids={})
 FE=> Bind(stmt=null,portal=null)
 FE=> Execute(portal=null,limit=0)
 FE=> Parse(stmt=null,query="SELECT ... (snip)...)
 FE=> Bind
 FE=> Describe
 FE=> Execute
 FE=> Sync
 <=BE CloseComplete
 <=BE ParseComplete
 <=BE BindComplete
 <=BE RowDescription
 <=BE DataRow
 <=BE DataRow
 ...(snip)...
 <=BE CommandStatus(SELECT 352)
*** Exception!! ***

The first two CloseStatement were for DELETE and INSERT statements issued before the SELECT. The CloseComplete was a response for either of them. Another expected CloseComplete was not returned. All the remaining messages from the backend were for the SELECT statement. There were not responces for BEGIN.

This can be reproduced sometimes even, not always. I confirmed this using HEAD of master branch.

A JAVA program for reproducing the issue was created by the client, so I am asking them if we can share this. If we can, I'll upload this.

The basic procedure of the program is as below.

1. Set autocommmit to off.
2. Issue DELETE and INSERT statements in a thread.
3. Execute a garbage collection
4. Issue SELECT statements in another thread.
5. Repeat 1-4 for some times.



To resolve this problem, I analysed a debug log of Pgpool-II. Attached is the debug log of a child process including both the normal case (l.1-5621) and the exception case (l.5622-11041).

Through the analysis, I came to think that this is a bug of pool_push_pending_data(). I'll explain it below.


While handling SELECT statements, Pgpool-II call do_query() to check the transaction isolation level, and so on. At this time, pgpool_push_pending_data() is called and this tries to preserve all the response messages for the messages which have been sent to the backend already. However, this could miss to preserve some messages because this stops to read data from the backend when this receives CloseComplete message.

For example, five messages are pushed to preserve them in the normal case (l.4549-4580 in the attached log), but only one message (CloseComplete) is pushed in the exception case (l.10105-10110). Also, the next CloseComplete is discarded (l.10111).

The actual responses for BEGIN are read and discarded in do_query() in the exception case. (l.10135-10154).


To fix this, pool_push_pending_data() has to receive all the required messages from the backend. To detect if we have get all the message, I think we can use the length of the pending messages list (session_context->pending_messages). Of course, this is not enough because the number of sent messages and the number of these responses are not always same. However, if we assume that the the latter is larger than the former, we would have to wait to receive the messages fo the former number at least.


Attached patch will fix the problem in the above approach. After applying this, I don't see the exception.

What do you think about my idea? The assumption and approach are right?
Steps To ReproduceA JAVA program for reproducing the issue was created by the client, so I am asking them if we can share this. If we can, I'll upload this.

The basic procedure of the program is as below.

1. Set autocommmit to off.
2. Issue DELETE and INSERT statements in a thread.
3. Execute a garbage collection
4. Issue SELECT statements in another thread.
5. Repeat 1-4 for some times.
TagsNo tags attached.

Activities

nagata

2018-10-11 18:10

developer  

pgpool_debug.log (1,011,529 bytes)
pool_push_pending_data.patch (1,370 bytes)
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index 8b4d5b2d..d2f98558 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -4790,6 +4790,9 @@ bool pool_push_pending_data(POOL_CONNECTION *backend)
 	bool pending_data_existed = false;
 	static char random_statement[] = "pgpool_non_existent";
 
+	int num_pending_messages;
+	int num_pushed_messages;
+
 	if (!pool_get_session_context(true) || !pool_is_doing_extended_query_message())
 		return false;
 
@@ -4842,6 +4845,9 @@ bool pool_push_pending_data(POOL_CONNECTION *backend)
 				(errmsg("pool_push_pending_data: send flush message to %d", con->db_node_id)));
 	}
 
+	num_pending_messages = list_length(session_context->pending_messages);
+	num_pushed_messages = 0;
+
 	for(;;)
 	{
 		int len;
@@ -4868,7 +4874,7 @@ bool pool_push_pending_data(POOL_CONNECTION *backend)
 
 		if (!pool_ssl_pending(backend) && pool_read_buffer_is_empty(backend) && kind != 'E')
 		{
-			if (kind != '3' || pending_data_existed)
+			if (kind != '3' || pending_data_existed || num_pushed_messages < num_pending_messages)
 				pool_set_timeout(-1);
 			else
 				pool_set_timeout(0);
@@ -4904,6 +4910,7 @@ bool pool_push_pending_data(POOL_CONNECTION *backend)
 			pool_set_ignore_till_sync();
 			break;
 		}
+		num_pushed_messages++;
 	}
 	return data_pushed;
 }

t-ishii

2018-10-11 19:25

developer   ~0002193

Unfortunately the proposed patch will not work since the pending message list does not necessarily include all the messages from frontend. For example , 'H' (flush).

Also I tried to apply your patch to master branch, but it did not apply. So I modified the patch and tested it against src/test/extended-query-test. It failed with bug370-sql-error-followed-by-select.data and others. I attached patch I used.

pool_push_pending_data.diff (1,358 bytes)
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index 2d2fb78f..be370903 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -4948,6 +4948,9 @@ pool_push_pending_data(POOL_CONNECTION * backend)
 	bool		pending_data_existed = false;
 	static char random_statement[] = "pgpool_non_existent";
 
+	int num_pending_messages;
+	int num_pushed_messages;
+
 	if (!pool_get_session_context(true) || !pool_is_doing_extended_query_message())
 		return false;
 
@@ -5000,6 +5003,9 @@ pool_push_pending_data(POOL_CONNECTION * backend)
 				(errmsg("pool_push_pending_data: send flush message to %d", con->db_node_id)));
 	}
 
+	num_pending_messages = list_length(session_context->pending_messages);
+	num_pushed_messages = 0;
+
 	for (;;)
 	{
 		int			len;
@@ -5026,7 +5032,7 @@ pool_push_pending_data(POOL_CONNECTION * backend)
 
 		if (!pool_ssl_pending(backend) && pool_read_buffer_is_empty(backend) && kind != 'E')
 		{
-			if (kind != '3' || pending_data_existed)
+			if (kind != '3' || pending_data_existed || num_pushed_messages < num_pending_messages)
 				pool_set_timeout(-1);
 			else
 				pool_set_timeout(0);
@@ -5062,6 +5068,7 @@ pool_push_pending_data(POOL_CONNECTION * backend)
 			pool_set_ignore_till_sync();
 			break;
 		}
+		num_pushed_messages++;
 	}
 	return data_pushed;
 }

t-ishii

2018-10-12 11:59

developer   ~0002196

Thinking about this more, I realized your idea may be viable. If pending list length is longer than what we expect, then definitely Pgpool-II will hang. However, this is not likely happening since a frontend message never creates 2 or more pending message entries.
On the contrary, it could happen that the pending list length is shorter than what we expect, since not all of frontend messages create a pending message. But in this case this should be harmless, or at least lead to a hung, because if the list is too short, Pgpool-II will give up to read more messages from backend, but this is same as current implementation.

BTW, the reason why your patch failed was, just counting the list length is not correct. We want to know the number of pending messages for primary node. However the pending message entry could include messages for load balance node. This is easy to fix.

t-ishii

2018-10-12 14:20

developer   ~0002197

So here's the revised patch in that way. I have confirmed that all regression tests and extended query tests passed with this on master branch.
(I have already pushed this).

Please test with your program and data.

pool_push_pending_data-v2.diff (136 bytes)
Waiting for Emacs...
[master c037da00] Fix occasional less data returned to frontend.
 3 files changed, 38 insertions(+), 1 deletion(-)

nagata

2018-10-16 15:52

developer   ~0002202

Thank you for your fixing this.
I confirmed that the exception doesn't occur in the current code.

Issue History

Date Modified Username Field Change
2018-10-11 18:09 nagata New Issue
2018-10-11 18:10 nagata File Added: pgpool_debug.log
2018-10-11 18:10 nagata File Added: pool_push_pending_data.patch
2018-10-11 19:25 t-ishii File Added: pool_push_pending_data.diff
2018-10-11 19:25 t-ishii Note Added: 0002193
2018-10-12 11:59 t-ishii Note Added: 0002196
2018-10-12 12:51 t-ishii Assigned To => t-ishii
2018-10-12 12:51 t-ishii Status new => assigned
2018-10-12 12:51 t-ishii Product Version => 3.7.6
2018-10-12 14:20 t-ishii File Added: pool_push_pending_data-v2.diff
2018-10-12 14:20 t-ishii Note Added: 0002197
2018-10-12 14:20 t-ishii Status assigned => feedback
2018-10-16 15:52 nagata Note Added: 0002202
2018-10-16 15:52 nagata Status feedback => assigned
2018-10-17 09:43 t-ishii Status assigned => resolved
2018-10-17 09:43 t-ishii Product Version 3.7.6 => 3.7.5
2018-10-17 09:43 t-ishii Target Version => 3.7.6