View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000282 | Pgpool-II | Bug | public | 2017-01-30 22:55 | 2017-03-26 16:21 |
| Reporter | jakubs | Assigned To | t-ishii | ||
| Priority | high | Severity | major | Reproducibility | always |
| Status | resolved | Resolution | open | ||
| Product Version | 3.6.1 | ||||
| Summary | 0000282: Pgpool hangs on query and doesn't send it to backend | ||||
| Description | Hi. I have a certain piece of code in my application. It sends two queries to pgpool working in load balancer setting. I had two backends (master and slave in streaming replication mode), but the problem doesn't seem to concern that, because I am able to reproduce it only on working master. The problem wasn't reproducing itself always in the previous version (3.6.0) and I postponed it's resolution. I wanted to resolve it today, so I installed newest version of pgpool. And now I am able reproduce this problem every time. My setup is: OS: CentOS 7 3.10.0-327.36.3.el7.x86_64 DB: PostgreSQL 9.6 pgPool-II: version 3.6.1 The problem seems to occur every time I try when it is on the same db connection (though I'm not sure about it, but I made setup which opens one connection and it reproduces everytime. However when I made performance tests with hundreds of clients and dozens of connections it didn't occur so often). I will attach some logs: 1. Log from pgpool with debug mode on and sequence of queries which lead to hangup 2. Network traffick from the same time from between web server (tomcat) and pgpool 3. Network traffick from the same time from between pgpool and PostgreSQL Network traffic files are readable by Wireshark. You can see there that tomcat queries pgpool for some objects, pgpool receives it, logs making a query of form: SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE c.oid = to_regclass('"pt_trace_types"') AND c.relnamespace = n.oid AND n.nspname = 'pg_catalog' but it doesn't actually send it to backend (no such query in second network trafic log). And this is it, tomcat waits for query to return but pgpool doesn't make it. I sam a similar bug no. 167, maybe it is linked somehow. Please let me know if you need any more logs or information. | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
|
|
|
|
|
|
But according the pgpool log you attached, Pgpool-II actually sent the query to PostgreSQL: LOG: DB node id: 0 backend pid: 3692 statement: SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE c.oid = to_regclass('"pt_trace_types"') AND c.relnamespace = n.oid AND n.nspname = 'pg_catalog' (This is the very last line of the log file). |
|
|
I know that log says it sent the query to backend but there is no such query on the wire as you can see in hangPgPool2Postgres.pcap file (or in the database itself - last query on the session is the one done before) so I suspected that pgpool logged the query but didn't actually send it. It is definitely problem with pgpool because if I connect my application directly to master database there is no single hung query (and I test it with multiple users so there is loads of queries being sent), everything works as it should. |
|
|
I haven't heard this kind of case (Pgpool-II sends a query, but it doesn't reach to PostgreSQL) since Pgpool was born. So it's hard to believe to be honest. |
|
|
Ok. I will try to find the cause myself and let know here what was the problem. |
|
|
One way to do this, is attaching strace and see if Pgpool-II process actually sends packet to OS. You can see this by watching write() system call. To make the operation easier, you could set num_init_children to 1, and start Pgpool-II and the issue ps command. 21320 ? S 0:00 pgpool: wait for connection request This is the process which handles incoming connections from clients and you want to attach strace to it. Hope this helps. |
|
|
Ok, so I managed finally to find time to debug pgpool code and find out what exactly is happening. As you could see in my attachment hangPgPool2Postgres.pcap, right before end of it is the sequence of messages between pgpool and Postgres: <Z >C >H <3 >H (Postgres end is on the right side). As you can see, pgpool issues H (Flush) after CloseComplete (3) received from Postgres. At this stage (right before H is sent to Postgres) the flag in session_context called is_pending_response is set after calling Close in ProcessFrontendResponse in pool_proto_modules.c:2422, and not unset after calling CloseComplete in ProcessBackendResponse in pool_proto_modules.c:2579. When pgpool gets next query from frontend (Parse) it tries to determine if table is catalog table. It is also first query to this table so there is no cached result and pgpool must send query SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE c.oid = to_regclass('"pt_trace_types"') AND c.relnamespace = n.oid AND n.nspname = 'pg_catalog' to Postgres to determine it. Eventually it ends up in function pool_search_relcache in pool_relcache.c in the line 165 where function do_query is called. In do_query (pool_process_query.c) in line 2012 there is an "if" checking if flag is_pending_response is set, and it is because it was never unset from the previous operation (Close and CloseCompletion). And here there is H sent to Postgres by pgpool and there is waiting for response from Postgres which never happens because there is nothing pending - the last message was CloseCompletion so there can be nothing waiting to be flushed). Finally there is a question to you how should it be handled. I simply added call of function pool_unset_pending_response after line 2582 in pool_proto_modules.c and it does the trick for me, bu I'm new to Postgres protocol and pgpool so I think you should evaluate if it is really a bug or if there is something wrong with some other component of the test setup. |
|
|
I have created a patch which could solve the problem. Can you please try it out? http://www.pgpool.net/mantisbt/file_download.php?file_id=295&type=bug |
|
|
I tries to apply patch to source version 3.6.1 but I get an error aplying junk 0000014 at 1593 in file src/protocol/pool_proto_modules.c. The rej file is as follows: --- src/protocol/pool_proto_modules.c +++ src/protocol/pool_proto_modules.c @@ -1593,12 +1638,12 @@ POOL_STATUS Close(POOL_CONNECTION *frontend, POOL_CONNECTION_POOL *backend, pool_unset_pending_response(); /* - * Remove send message + * Remove sent message */ ereport(DEBUG1, (errmsg("Close: removing sent message %c %s", *contents, contents+1))); - - pool_remove_sent_message(*contents == 'S'?'P':'B', contents+1); + pool_set_sent_message_state(msg); +// pool_remove_sent_message(*contents == 'S'?'P':'B', contents+1); } return POOL_CONTINUE; Is this the right version? |
|
|
The patch can be applied to 3.6 stable head. Can you please upgrade to 3.6 stable head and apply the patch? Or apply attached patch (which is diff between 3.6.1 and 3.6 stable head as of today) before applying the patch. |
|
|
|
|
|
It seems that this patch works. I have another problem but I will make separate bug when I will know what is going on. |
|
|
Close this issue. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2017-01-30 22:55 | jakubs | New Issue | |
| 2017-01-30 22:55 | jakubs | File Added: pgpoolLog.txt | |
| 2017-01-30 22:56 | jakubs | File Added: hangPgPool2Postgres.pcap | |
| 2017-01-30 22:56 | jakubs | File Added: hangTomcat2PgPool.pcap | |
| 2017-02-03 14:56 | t-ishii | Note Added: 0001322 | |
| 2017-02-03 17:28 | jakubs | Note Added: 0001323 | |
| 2017-02-03 17:33 | t-ishii | Note Added: 0001324 | |
| 2017-02-03 17:43 | jakubs | Note Added: 0001325 | |
| 2017-02-03 17:50 | t-ishii | Note Added: 0001326 | |
| 2017-02-06 08:37 | t-ishii | Assigned To | => t-ishii |
| 2017-02-06 08:37 | t-ishii | Status | new => assigned |
| 2017-02-06 08:37 | t-ishii | Status | assigned => feedback |
| 2017-02-16 21:48 | jakubs | Note Added: 0001348 | |
| 2017-02-16 21:48 | jakubs | Status | feedback => assigned |
| 2017-02-23 17:03 | t-ishii | Note Added: 0001359 | |
| 2017-02-23 17:03 | t-ishii | Status | assigned => feedback |
| 2017-02-23 21:38 | jakubs | Note Added: 0001362 | |
| 2017-02-23 21:38 | jakubs | Status | feedback => assigned |
| 2017-02-24 07:49 | t-ishii | Note Added: 0001364 | |
| 2017-02-24 07:49 | t-ishii | File Added: V3_6_1-V3_6_STABLE.diff | |
| 2017-02-24 07:50 | t-ishii | Status | assigned => feedback |
| 2017-02-28 17:49 | jakubs | Note Added: 0001369 | |
| 2017-02-28 17:49 | jakubs | Status | feedback => assigned |
| 2017-03-26 16:21 | t-ishii | Note Added: 0001398 | |
| 2017-03-26 16:21 | t-ishii | Status | assigned => resolved |