View Issue Details

IDProjectCategoryView StatusLast Update
0000318Pgpool-IIBugpublic2017-08-29 09:28
Reporteravi Assigned Topengbo  
PriorityurgentSeverityblockReproducibilityalways
Status closedResolutionopen 
Product Version3.6.4 
Summary0000318: Exception in plpgsql function is not passed to application
DescriptionHi

If exception occurs inside plpgsql function the application fails to get the exception from pgpool and it keeps waiting forever. Please note that in postgres, the connection is marked as "idle in transaction (aborted)".

This bug is not reproduced with version 3.6.1. Only version 3.6.3 and 3.6.4 has it. In addition if the slave machine is down and we work only with master, the bug is not reproduced. If we bypass pgpool and connect directly to port 5432 it works.

P.S. I saw differences in sniffer captures between good case (error is reported immediately) vs “bad” case (error is not reported) – It seems there is missing one packet in “bad” case (packets 0000045-46 in “pgpool_3_6_1.pcap” vs packet#46 in “pgpool_3_6_4.pcap”).




Versions:
pgpool: 3.6.4
PostgreSQL: “PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit”
JDBC PG Driver: 42.1.1 or 9.4.1212.jre7 (in case of NMS + Hibernate4 + Spring4 transaction manager)
Java8


Steps To ReproducePlease use the attached unit test
TagsNo tags attached.

Activities

avi

2017-07-03 17:35

reporter  

drrtuy

2017-07-04 04:48

reporter   ~0001566

Greetings,
Can you collect gdb backtraces of the pgpool process and postgres backend?

t-ishii

2017-07-04 10:35

developer   ~0001567

I cannot reproduce the problem by using pgproto.

pgproto -f bug318.data -p 11000 -d test
FE=> Parse(stmt="S1", query="BEGIN")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Parse(stmt="S1", query="SELECT count(*) as dummy from bar()")
FE=> Bind(stmt="S1", portal="")
FE=> Describe(stmt="S1")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Parse(stmt="S1", query="COMMIT")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Sync
<= BE ParseComplete
<= BE BindComplete
<= BE CommandComplete(BEGIN)
<= BE CloseComplete
<= BE ParseComplete
<= BE BindComplete
<= BE ParameterDescription
<= BE RowDescription
<= BE NoticeResponse(S NOTICE V NOTICE C 00000 M AlexL, will perfrom DibByZero W PL/pgSQL function bar() line 2 at RAISE F pl_exec.c L 3165 R exec_stmt_raise )
<= BE ErrorResponse(S ERROR V ERROR C 22012 M division by zero W SQL statement "select 1/0"
PL/pgSQL function bar() line 3 at EXECUTE F int.c L 719 R int4div )
<= BE ReadyForQuery(E)
FE=> Terminate
t-ishii@localhost: pgpool --version
pgpool-II version 3.6.4 (subaruboshi)

avi

2017-07-04 16:39

reporter   ~0001570

Hi

If I execute the same "SELECT count(*) as dummy from bar()" command from psql -h vip_ip -p 9999 it works. However, when I use the sample application with the attached unit test it get stuck 20 -30 percent of the times (it also fails when I execute a simple "select 1/0"). I assume pgproto is more similar to psql than to the sample unit-test therefore I think it will be more beneficial to simulate if using the sample unit-test. If you want to connect to us it is also possible

t-ishii

2017-07-04 16:50

developer   ~0001571

pgproto (https://github.com/tatsuo-ishii/pgproto) can execute either simple queries (like psql) and extended queries (like Java). The test case is for an extended query thus essentially it looks same from Pgpool-II's point of view.

avi

2017-07-04 20:47

reporter   ~0001573

Hi
Can you please send me the bug318.data file that you used. Is it being written manually or it can be auto generated in some way?

In addition why did you use port 11000 and not 9999

Thanks

avi

2017-07-04 22:24

reporter   ~0001575

Hi

The following is the bug318.data file I used to get it stuck (1 in about 10 attempts got stuck). (If this is not the proper commands sequence please send me the file you would like me to use.)

'P' "S1" "SELECT 1/0" 0
'B' "" "S1" 0 0 0
'E' "" 0
'C' 'S' "S1"
# Issue Sync message
'S'
# Receive response from backend
'Y'
# Send terminate message
'X'



The following is the data I got when it completed

[root@h2-nms trunk]# pgproto -f bug318.data -p 9999 -h172.18.255.40 -u postgres -d test_db
FE=> Parse(stmt="S1", query="SELECT 1/0")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Sync
<= BE ParseComplete
<= BE ErrorResponse(S ERROR V ERROR C 22012 M division by zero F int.c L 719 R int4div )
<= BE ReadyForQuery(I)
FE=> Terminate
[root@h2-nms trunk]# pgproto -f bug318.data -p 9999 -h172.18.255.40 -u postgres -d test_db
FE=> Parse(stmt="S1", query="SELECT 1/0")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Sync
<= BE ParseComplete
<= BE ErrorResponse(S ERROR V ERROR C 22012 M division by zero F int.c L 719 R int4div )
<= BE ReadyForQuery(I)
FE=> Terminate
[root@h2-nms trunk]# pgproto -f bug318.data -p 9999 -h172.18.255.40 -u postgres -d test_db
FE=> Parse(stmt="S1", query="SELECT 1/0")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Sync
<= BE ParseComplete
<= BE ErrorResponse(S ERROR V ERROR C 22012 M division by zero F int.c L 719 R int4div )
<= BE ReadyForQuery(I)
FE=> Terminate
[root@h2-nms trunk]# pgproto -f bug318.data -p 9999 -h172.18.255.40 -u postgres -d test_db
FE=> Parse(stmt="S1", query="SELECT 1/0")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Sync
<= BE ParseComplete
<= BE ErrorResponse(S ERROR V ERROR C 22012 M division by zero F int.c L 719 R int4div )
<= BE ReadyForQuery(I)
FE=> Terminate



The following is the data I got when it got stuck !!!!

[root@h2-nms trunk]# pgproto -f bug318.data -p 9999 -h172.18.255.40 -u postgres -d test_db
FE=> Parse(stmt="S1", query="SELECT 1/0")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Sync
<= BE ParseComplete
<= BE (S)
<= BE (S)
<= BE (S)
<= BE (S)
<= BE (S)
<= BE (S)
<= BE (S)
<= BE (S)
<= BE (S)

pengbo

2017-07-05 15:42

developer   ~0001576

I will try to reproduce the bug.

avi

2017-07-06 15:22

reporter   ~0001577

Hi

Just to give a little more info

If I execute pg_terminate_backend on the idle in transaction (aborted) connection I then get the stuck "division by zero" error message + some EOF message

<= BE (S)
<= BE ErrorResponse(S ERROR V ERROR C 22012 M division by zero F int.c L 719 R int4div )
<= BE ErrorResponse(S FATAL C XX000 M connection to postmaster on DB node 0 was lost due to pg_terminate_backend D pg_terminate_backend was called on the backend F pool_process_query.c L 4650 )
read_char: EOF detected[root@h2-nms trunk]#


If I wait for very long time without initiating pg_terminate_backend I eventually get the following (I do not know how long since it was overnight)
<= BE (S)
<= BE (S)
read_char: EOF detected[root@h2-nms trunk]#

pengbo

2017-07-07 16:53

developer   ~0001579

I used pgpool_setup to execute the following command more than 20 times, but I couldn't reproduce the bug.

[pengbo@localhost pgpoolsetup]$ pgproto -f bug318.data -p 11000 -u pengbo -d postgres
FE=> Parse(stmt="S1", query="SELECT 1/0")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Sync
<= BE ParseComplete
<= BE ErrorResponse(S ERROR C 22012 M division by zero F int.c L 719 R int4div )
<= BE ReadyForQuery(I)
FE=> Terminate
[pengbo@localhost pgpoolsetup]$ pgproto -f bug318.data -p 11000 -u pengbo -d postgres
FE=> Parse(stmt="S1", query="SELECT 1/0")
FE=> Bind(stmt="S1", portal="")
FE=> Execute(portal="")
FE=> Close(stmt="S1")
FE=> Sync
<= BE ParseComplete
<= BE ErrorResponse(S ERROR C 22012 M division by zero F int.c L 719 R int4div )
<= BE ReadyForQuery(I)
FE=> Terminate
pgpool.conf (35,129 bytes)   
pgpool.conf (35,129 bytes)   

avi

2017-07-09 15:59

reporter   ~0001580

The following is a short shell script to run pgproto in a loop. It fails all the time on all our setups. Please note that it fails on setup running pgpool 3.6.4 with replication master+slave and load balancing. If you have setup with only master (slave is down) you will not see the bug. Please run the sh script and see if it manage to finish the loop without error.

#!/bin/bash

File="./run_test.log"

for ((n=0; n<10000; n++))
do
 (
   echo "iteration $n "
   pgproto -f bug318.data -p 9999 -h172.18.255.40 -u postgres -d test_db > $File;
 ) 2>&1 | awk '{ print strftime("%Y-%m-%d %H:%M:%S:"), $0; fflush(); }' | tee -a "$File"

 if grep -q Terminate "$File"; then
   echo "ok"
 else
   echo "Bug"
   exit
 fi
done

---------------------------------------------------------------------------------------------------------------

The following is a bug318.data used by the batch script (which calls pgproto)

'P' "S1" "SELECT 1/0" 0
'B' "" "S1" 0 0 0
'E' "" 0
'C' 'S' "S1"

# Issue Sync message
'S'

# Receive response from backend
'Y'

# Send terminate message
'X'

ansel

2017-07-25 22:57

reporter   ~0001603

We do experience the same issue (ERRORs hanging a frontend connection
instead of being reported) when Pgpool is balancing over two streaming
slaves with their master. I can somewhat reliably reproduce it by
running simple error-throwing statements via a JDBC-using command line
tool called "sqlline" like below.

Running git-bisect yielded commit 7355ebb4d as the first failing one.

,----[ sqlline 3.6.5 or master ]
| $ sqlline -n postgres -p "" -u 'jdbc:postgresql://localhost:19999/postgres?loggerLevel=TRACE&loggerFile=/home/andreas/jdbc.log&prepareThreshold=1'
| Connecting to jdbc:postgresql://localhost:19999/postgres?loggerLevel=TRACE&loggerFile=/home/andreas/jdbc.log&prepareThreshold=1
| Connected to: PostgreSQL (version 9.6.3)
| Driver: PostgreSQL Native Driver (version PostgreSQL 9.4.1212)
| Autocommit status: true
| Transaction isolation: TRANSACTION_REPEATABLE_READ
| sqlline version 1.0.2 by Marc Prud'hommeaux
| 0: jdbc:postgresql://localhost:19999/postgres> create table t(c int primary key);
| No rows affected (0,034 seconds)
| 0: jdbc:postgresql://localhost:19999/postgres> insert into t values(1);
| 1 row affected (0,034 seconds)
| 0: jdbc:postgresql://localhost:19999/postgres> insert into t values(1);
| -- pgpool connection hangs here instead of reporting the unique violation
`----

Errors reported correctly in 3.6.1.

,----[ 3.6.1 ]
| 0: jdbc:postgresql://localhost:19999/postgres> insert into t values(1);
| Error: ERROR: duplicate key value violates unique constraint "t_pkey"
| Detail: Key (c)=(1) already exists. (state=23505,code=0)
| org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "t_pkey"
`----

t-ishii

2017-07-26 07:50

developer   ~0001605

ansel,

Thank you for the analysis. I also noticed the same thing and created a patch.

If you could try with attached patch and see if the problem is solved, that would be very helpful for me. If the problem is not solved, please run pgpool with debug option (-d) and share the pgpool log file.
pgpool-hung.diff (9,560 bytes)   
pgpool-hung.diff (9,560 bytes)   

t-ishii

2017-07-26 09:34

developer   ~0001606

With the patch, I successfully runs sqlline with your example.

t-ishii@localhost: JAVA_CLASSPATH=/usr/local/pgsql/share//postgresql-9.2-1003.jdbc4.jar sqlline -n t-ishii -p "" -u 'jdbc:postgresql://localhost:11000/test?&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&prepareThreshold=1'
[warning] /usr/bin/sqlline: Unable to locate postgresql-jdbc4 in /usr/share/java
[warning] /usr/bin/sqlline: Unable to locate mysql-connector-java in /usr/share/java
[warning] /usr/bin/sqlline: Unable to locate jtds in /usr/share/java
Connecting to jdbc:postgresql://localhost:11000/test?&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&prepareThreshold=1
Connected to: PostgreSQL (version 9.6.3)
Driver: PostgreSQL Native Driver (version PostgreSQL 9.2 JDBC4 (build 1003))
Autocommit status: true
Transaction isolation: TRANSACTION_REPEATABLE_READ
sqlline version 1.0.2 by Marc Prud'hommeaux
0: jdbc:postgresql://localhost:11000/test> drop table t;
drop table t;
No rows affected (0.004 seconds)
0: jdbc:postgresql://localhost:11000/test> create table t(c int primary key);
create table t(c int primary key);
No rows affected (0.006 seconds)
0: jdbc:postgresql://localhost:11000/test> insert into t values(1);
insert into t values(1);
1 row affected (0.004 seconds)
0: jdbc:postgresql://localhost:11000/test> insert into t values(1);
insert into t values(1);
Error: ERROR: duplicate key value violates unique constraint "t_pkey"
  Detail: Key (c)=(1) already exists. (state=23505,code=0)
0: jdbc:postgresql://localhost:11000/test>

ansel

2017-07-26 15:11

reporter   ~0001607

Thanks for the quick response. I can confirm that the patch fixes the issue for me when applied on top of master.

pgpool generates the following LOG message now in this situation:

    2017-07-26 08:05:26: pid 21769: LOG: Process backend response: sync pending message found after receiving error response

I'm attaching the debug output between two of those messages.
sync-pending-message.txt (22,441 bytes)   
sync-pending-message.txt (22,441 bytes)   

t-ishii

2017-07-26 18:13

developer   ~0001608

Ansel,

Thank for you for testing the patch. I will commit the patch to master, 3.6 and 3.5 stable trees.
> pgpool generates the following LOG message now in this situation:
This is normal and I think we should degrade the message to DEBUG1 or so.

> I'm attaching the debug output between two of those messages.
Thank you for the log. I will look into it so that I can confirm my patch works as expected in your environment.

Issue History

Date Modified Username Field Change
2017-07-03 17:35 avi New Issue
2017-07-03 17:35 avi File Added: idle_in_transaction_aborted.zip
2017-07-04 04:48 drrtuy Note Added: 0001566
2017-07-04 10:35 t-ishii Note Added: 0001567
2017-07-04 16:39 avi Note Added: 0001570
2017-07-04 16:50 t-ishii Note Added: 0001571
2017-07-04 20:47 avi Note Added: 0001573
2017-07-04 22:24 avi Note Added: 0001575
2017-07-05 15:39 pengbo Assigned To => pengbo
2017-07-05 15:39 pengbo Status new => assigned
2017-07-05 15:42 pengbo Note Added: 0001576
2017-07-06 15:22 avi Note Added: 0001577
2017-07-07 16:53 pengbo File Added: pgpool.conf
2017-07-07 16:53 pengbo Note Added: 0001579
2017-07-09 15:59 avi Note Added: 0001580
2017-07-25 22:57 ansel Note Added: 0001603
2017-07-26 07:50 t-ishii File Added: pgpool-hung.diff
2017-07-26 07:50 t-ishii Note Added: 0001605
2017-07-26 09:34 t-ishii Note Added: 0001606
2017-07-26 15:11 ansel File Added: sync-pending-message.txt
2017-07-26 15:11 ansel Note Added: 0001607
2017-07-26 18:13 t-ishii Note Added: 0001608
2017-08-29 09:28 pengbo Status assigned => closed