View Issue Details

IDProjectCategoryView StatusLast Update
0000370Pgpool-IIBugpublic2018-02-16 06:04
ReporteraviAssigned Tot-ishii 
PriorityhighSeverityblockReproducibilityalways
Status resolvedResolutionopen 
Product Version3.6.7 
Target Version3.6.8Fixed in Version3.6.8 
Summary0000370: Pgpool get stuck with db exception followed by select
DescriptionWe tested pgpool version 3.6.7 with pgproto and found that a db exception followed by select cause pgpool to get stuck

[root@h2-nms trunk]# ./run_tests.sh
iteration 0
FE=> Query(query="SET statement_timeout = '4s'")
<= BE CommandComplete(SET)
<= BE ReadyForQuery(I)
FE=> Parse(stmt="S1", query="SELECT 1/0")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Parse(stmt="S2", query="SELECT * from pg_tables")
FE=> Bind(stmt="S2", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S2")
FE=> Sync


===================================
Steps To ReproduceTo reproduce the problem please:


create a sh script with the following loop

for ((n=0; n<50000; n++))
 do
(
   echo "iteration $n "
   pgproto -f bug.data -p 9999 -h your_IP -u postgres -d postgres;
) done



========== This is the bug data file to be used with pgproto ===============
'Q' "SET statement_timeout = '4s'"

# Receive response from backend
'Y'

'P' "S1" "SELECT 1/0" 0
'B' "" "S1" 0 0 0
'E' "" 0
'C' 'S' "S1"

'P' "S2" "SELECT * from pg_tables" 0
'B' "" "S2" 0 0 0
'E' "" 0
'C' 'S' "S2"

# Issue Sync message
'S'

# Receive response from backend
'Y'

# Send terminate message
'X'


TagsNo tags attached.

Activities

t-ishii

2017-12-11 07:36

developer   ~0001854

The problem is reproduced here. It only happens when the erroneous query was sent to primary and the next query requires a catalogue cache look up.

At the point when the erroneous was sent, backend is already in error state. The next query requires a catalogue cache look up and issues a SELECT. But since backend will ignore everything until a sync message received, backend will not respond, and Pgpool-II gets stuck.

Will think how to deal with this.

t-ishii

2017-12-11 13:38

developer  

bug370.patch (2,577 bytes)
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index 02f10ea..11d3dc3 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -1903,6 +1903,24 @@ void do_query(POOL_CONNECTION *backend, char *query, POOL_SELECT_RESULT **result
 		if (SL_MODE && pool_pending_message_exists())
 		{
 			data_pushed = pool_push_pending_data(backend);
+			/*
+			 * ignore_till_sync flag may have been set in pool_push_pending_data().
+			 * So check it.
+			 */
+			if (pool_is_ignore_till_sync())
+			{
+				if (data_pushed)
+				{
+					int poplen;
+
+					pool_pop(backend, &poplen);
+					ereport(DEBUG1,
+							(errmsg("do_query: popped data len:%d because ignore till sync was set", poplen)));
+				}
+				ereport(DEBUG1,
+						(errmsg("do_query: no query issued because ignore till sync was set")));
+				return;
+			}
 		}
 
 		if (pname_len == 0)
@@ -4846,7 +4864,7 @@ bool pool_push_pending_data(POOL_CONNECTION *backend)
 			pool_read(backend, buf, len);
 		}
 
-		if (!pool_ssl_pending(backend) && pool_read_buffer_is_empty(backend))
+		if (!pool_ssl_pending(backend) && pool_read_buffer_is_empty(backend) && kind != 'E')
 		{
 			if (kind != '3' || pending_data_existed)
 				pool_set_timeout(-1);
@@ -4876,6 +4894,14 @@ bool pool_push_pending_data(POOL_CONNECTION *backend)
 			pfree(buf);
 		}
 		data_pushed = true;
+		if (kind == 'E')
+		{
+			/* Found error response */
+			ereport(DEBUG1,
+					(errmsg("pool_push_pending_data: ERROR response found")));
+			pool_set_ignore_till_sync();
+			break;
+		}
 	}
 	return data_pushed;
 }
diff --git a/src/utils/pool_relcache.c b/src/utils/pool_relcache.c
index 9872a53..cb2ff51 100644
--- a/src/utils/pool_relcache.c
+++ b/src/utils/pool_relcache.c
@@ -5,7 +5,7 @@
  * pgpool: a language independent connection pool server for PostgreSQL
  * written by Tatsuo Ishii
  *
- * Copyright (c) 2003-2012	PgPool Global Development Group
+ * Copyright (c) 2003-2017	PgPool Global Development Group
  *
  * Permission to use, copy, modify, and distribute this software and
  * its documentation for any purpose and without fee is hereby
@@ -207,7 +207,7 @@ void *pool_search_relcache(POOL_RELCACHE *relcache, POOL_CONNECTION_POOL *backen
 	/* Register cache */
 	result = (*relcache->register_func)(res);
 
-	if (!relcache->no_cache_if_zero || result)
+	if (!pool_is_ignore_till_sync() && (!relcache->no_cache_if_zero || result))
 	{
 		strlcpy(relcache->cache[index].dbname, dbname, MAX_ITEM_LENGTH);
 		strlcpy(relcache->cache[index].relname, table, MAX_ITEM_LENGTH);
bug370.patch (2,577 bytes)

t-ishii

2017-12-11 13:38

developer   ~0001855

Ok, here is a patch trying to fix this issue.

avi

2017-12-11 22:50

reporter   ~0001856

Hi

I applied the patch and tested it. This issue seems to be resolved. Thanks you!

Could you please let us know when the next minor release will be out. This problem is critical and is causing the whole system to freeze.

In addition, when I applied the patch I applied it on sources from pgpool-II-3.6.7.tar.gz and "patch -p1 < patch.file" command was successful. However the command returned that the line numbers were adjusted. Should the patch be applied on pgpool-II-3.6.7.tar.gz or on more current sources from your svn?

Thanks

t-ishii

2017-12-11 23:42

developer   ~0001857

For the release dates, we have not decided yet. I will discuss with other developers.

The patch was created for master branch head. As long as the patch applies to 3.6 tree, it should be ok.

BTW, we are using git, rather than svn:-)

t-ishii

2017-12-12 16:38

developer   ~0001859

I found another bug with Pgpool-II 3.6.7. It can be reproduced by following pgproto data. This is basically just another SELECT added to your example. The fix is already in the git repository.
You can download a snapshot tar ball from the repository. (this is close to the next 3.6.8) Since I have discovered two critical bugs, I think we need to make minor releases as soon as possible.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=snapshot;h=911d6e78429598b399577b9473d5ab3148ecd795;sf=tgz

'Q' "SET statement_timeout = '4s'"

# Receive response from backend
'Y'

'P' "S1" "SELECT 1/0" 0
'B' "" "S1" 0 0 0
'E' "" 0
'C' 'S' "S1"

'P' "S2" "SELECT * from pg_tables" 0
'B' "" "S2" 0 0 0
'E' "" 0
'C' 'S' "S2"

# Issue Sync message
'S'

# Receive response from backend
'Y'

'P' "" "SELECT * from pg_tables" 0
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'
# Send terminate message
'X'

t-ishii

2017-12-12 16:49

developer   ~0001860

It seems the bug starts from 3.6.6.

avi

2017-12-12 18:13

reporter   ~0001861

Thanks. The bug I reported two days ago started at 3.6.7. I tested it on 3.6.6 and it worked well.

I'm happy that a new 3.6.8 release will be out ASAP since we cannot use 3.6.6 because of a fail-over problem and not 3.6.7 because of the freeze problem.

t-ishii

2017-12-13 15:18

developer   ~0001862

May I close this issue?

avi

2017-12-14 18:25

reporter   ~0001868

Hi
I would like to test it on my setup before we close this issue.

I took the 3.6.7 rpms and replace the sources tar.gz file with the one you sent me. I ran the rpmbuild command and got the following. Can you help please...
  
bash-4.1$ rpmbuild -ba pgpool.spec --define="pgpool_version 3.6.7" --define="pg_version 96" --define="pghome /usr/pgsql-9.6" --define="dist rhel6"
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.F6r2o2
+ umask 022
+ cd /var/lib/pgsql/rpmbuild/BUILD
+ LANG=C
+ export LANG
+ unset DISPLAY
+ cd /var/lib/pgsql/rpmbuild/BUILD
+ rm -rf pgpool-II-3.6.7
+ /usr/bin/gzip -dc /var/lib/pgsql/rpmbuild/SOURCES/pgpool-II-3.6.7.tar.gz
+ /bin/tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd pgpool-II-3.6.7
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ echo 'Patch 0000001 (pgpool-II-head.patch):'
Patch 0000001 (pgpool-II-head.patch):
+ /bin/cat /var/lib/pgsql/rpmbuild/SOURCES/pgpool-II-head.patch
+ /usr/bin/patch -p1 --fuzz=0
can't find file to patch at input line 4
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff -crN pgpool-II-3.6.7/doc/src/sgml/html/LEGALNOTICE.html head/pgpool-II-3.6.7/doc/src/sgml/html/LEGALNOTICE.html
|*** pgpool-II-3.6.7/doc/src/sgml/html/LEGALNOTICE.html Wed Nov 1 01:02:24 2017
|--- head/pgpool-II-3.6.7/doc/src/sgml/html/LEGALNOTICE.html Wed Nov 1 03:02:23 2017
--------------------------

t-ishii

2017-12-22 16:34

developer   ~0001878

You are asking the same question on the mailing list. So I assume you do not need help on Mantis regarding RPM.

t-ishii

2018-01-12 14:37

developer   ~0001891

May I close this issue?

avi

2018-02-16 00:20

reporter   ~0001929

Yes, I tested it with 3.6.8 and 3.6.9 and it was fixed.

Thank you!

t-ishii

2018-02-16 06:03

developer   ~0001930

Thank you for the feedbback!. I mark this issue as "resolved".

Issue History

Date Modified Username Field Change
2017-12-10 22:13 avi New Issue
2017-12-11 07:36 t-ishii Note Added: 0001854
2017-12-11 07:37 t-ishii Assigned To => t-ishii
2017-12-11 07:37 t-ishii Status new => assigned
2017-12-11 07:37 t-ishii Steps to Reproduce Updated View Revisions
2017-12-11 13:38 t-ishii File Added: bug370.patch
2017-12-11 13:38 t-ishii Note Added: 0001855
2017-12-11 13:38 t-ishii Status assigned => feedback
2017-12-11 22:50 avi Note Added: 0001856
2017-12-11 22:50 avi Status feedback => assigned
2017-12-11 23:42 t-ishii Note Added: 0001857
2017-12-11 23:42 t-ishii Target Version => 3.6.8
2017-12-12 16:38 t-ishii Note Added: 0001859
2017-12-12 16:49 t-ishii Note Added: 0001860
2017-12-12 18:13 avi Note Added: 0001861
2017-12-13 15:18 t-ishii Note Added: 0001862
2017-12-13 15:18 t-ishii Status assigned => feedback
2017-12-14 18:25 avi Note Added: 0001868
2017-12-14 18:25 avi Status feedback => assigned
2017-12-22 16:34 t-ishii Note Added: 0001878
2017-12-22 17:40 t-ishii Status assigned => feedback
2018-01-12 14:37 t-ishii Note Added: 0001891
2018-02-16 00:20 avi Note Added: 0001929
2018-02-16 00:20 avi Status feedback => assigned
2018-02-16 06:03 t-ishii Note Added: 0001930
2018-02-16 06:04 t-ishii Status assigned => resolved
2018-02-16 06:04 t-ishii Fixed in Version => 3.6.8