View Issue Details

IDProjectCategoryView StatusLast Update
0000145Pgpool-IIBugpublic2015-10-16 18:29
ReporterueharaAssigned Tot-ishii 
PriorityhighSeveritymajorReproducibilityalways
Status resolvedResolutionopen 
Product Version 
Target VersionFixed in Version 
Summary0000145: Didn't sent reset_query to Standby node.
DescriptionConfiguration information
----------------
 - pgpool-II 3.4.2 or 3.4.3 or git(master) Problems in these version
 - postgreSQL 9.4.1
 - JDBC Driver 9.4-1201 jdbc41

pgpool-II :192.168.1.3
PostgreSQL(Pri):192.168.1.1
PostgreSQL(Std):192.168.1.2


$ psql -h 192.168.1.3 -p 9999 postgres -c "show pool_nodes";
 node_id | hostname | port | status | lb_weight | role
---------+-------------+------+--------+-----------+---------
 0 | 192.168.1.1 | 5432 | 2 | 0.000000 | primary
 1 | 192.168.1.2 | 5432 | 2 | 1.000000 | standby
(2 rows)

-----------------

I found problem case.
This problem occured at using application name load balancing and using JDBC Driver connection.

pgpool-II make two connections to each DB servers(Pri and Std).
But this case, pgpool-II can't process initialize stabdby session.

child process has continued remaining, so finally client can't connect pgpool-II.


pgpool.log at the time of problems.
(used sample-java1)
------------
2015-09-09 15:34:44: pid 13770: DEBUG: session context: setting query in progress. DONE
2015-09-09 15:34:44: pid 13770: DEBUG: decide where to send the queries
2015-09-09 15:34:44: pid 13770: DETAIL: destination = 3 for query= " DISCARD ALL"
2015-09-09 15:34:44: pid 13770: LOG: DB node id: 0 backend pid: 13651 statement: DISCARD ALL
2015-09-09 15:34:44: pid 13770: DEBUG: waiting for query response
2015-09-09 15:34:44: pid 13770: DETAIL: waiting for backend:0 to complete the query
2015-09-09 15:34:44: pid 13770: DEBUG: detect error: kind: S
2015-09-09 15:34:44: pid 13770: DEBUG: detect error: kind: S
2015-09-09 15:34:44: pid 13770: DEBUG: detect error: kind: S
2015-09-09 15:34:44: pid 13770: DEBUG: detect error: kind: S
2015-09-09 15:34:44: pid 13770: DEBUG: reading backend data packet kind
2015-09-09 15:34:44: pid 13770: DETAIL: backend:0 kind:'S'
-------------


I'm hoping to move as follows.
(used sample-java2)
------
2015-09-09 15:35:46: pid 13802: DEBUG: session context: setting query in progress. DONE
2015-09-09 15:35:46: pid 13802: DEBUG: decide where to send the queries
2015-09-09 15:35:46: pid 13802: DETAIL: destination = 3 for query= " DISCARD ALL"
2015-09-09 15:35:46: pid 13802: LOG: DB node id: 0 backend pid: 13657 statement: DISCARD ALL
2015-09-09 15:35:46: pid 13802: DEBUG: waiting for query response
2015-09-09 15:35:46: pid 13802: DETAIL: waiting for backend:0 to complete the query
2015-09-09 15:35:46: pid 13802: DEBUG: detect error: kind: S
2015-09-09 15:35:46: pid 13802: DEBUG: detect error: kind: S
2015-09-09 15:35:46: pid 13802: DEBUG: detect error: kind: S
2015-09-09 15:35:46: pid 13802: DEBUG: detect error: kind: S
2015-09-09 15:35:46: pid 13802: LOG: DB node id: 1 backend pid: 17338 statement: DISCARD ALL
2015-09-09 15:35:46: pid 13802: DEBUG: waiting for query response
2015-09-09 15:35:46: pid 13802: DETAIL: waiting for backend:1 to complete the query
2015-09-09 15:35:46: pid 13802: DEBUG: reading backend data packet kind
2015-09-09 15:35:46: pid 13802: DETAIL: backend:0 kind:'S'
------


result of command "ps -ef| pool" at the time of problems.
------
$ ps -ef | grep pool
uehara 9405 4811 0 12:41 pts/4 00:00:00 vim /home/uehara/local/pl342/etc/pgpool.conf
uehara 13037 1251 0 21:06 pts/3 00:00:00 tail -F /home/uehara/local/pl343/log/pgpool.log
uehara 13095 2675 0 21:09 pts/2 00:00:00 pgpool -f /home/uehara/local/pl343/etc/pgpool.conf -n
uehara 13097 13095 0 21:09 pts/2 00:00:00 pgpool: wait for connection request
uehara 13098 13095 0 21:09 pts/2 00:00:00 pgpool: wait for connection request
uehara 13099 13095 0 21:09 pts/2 00:00:00 pgpool: PCP: wait for connection request
uehara 13100 13095 0 21:09 pts/2 00:00:00 pgpool: worker process
uehara 13116 13095 0 21:10 pts/2 00:00:00 pgpool: kazu testdb 192.168.1.3(35301) DISCARD (*)
uehara 13131 2675 0 21:10 pts/2 00:00:00 grep pool

(*) This child process is in use. This process continue remaining.
-------


I analyzed using gdb.
I thought that session_context( query_context->where_to_send[1] ) set illegal value.
Maybe, this value was set at the time of connect DB.


src/protocol/pool_proto_modules.c
626 /* Send query */
627 for (i=0;i<NUM_BACKENDS;i++)
628 {
629 if (!VALID_BACKEND(i))
630 continue;
631 else if (send_type < 0 && i == node_id)
632 continue;
633 else if (send_type > 0 && i != node_id)
634 continue;
...
670
671 per_node_statement_log(backend, i, string);
672

node_id=0 went to line632.
But node_id=1 went to line634. In brief, node_id=1 could't execute line671.
So, pgpool-II couldn't sent reset query to node_id=1.



I analyzed VALID_BACKEND(i).

src/protocol/pool_proto_modules.c
249 bool pool_is_node_to_be_sent_in_current_query(int node_id)
250 {
...
260 if (pool_is_query_in_progress() && sc->query_context)
261 {
262 return pool_is_node_to_be_sent(sc->query_context, node_id);
263 }
264 return true;
265 }


(at node_id=1)
I was confirmed that pool_is_node_to_be_sent returns false.

below, I show gdb output.


pool_is_node_to_be_sent_in_current_query (node_id=1) at context/pool_query_context.c:253
253 if (RAW_MODE)
(gdb) bt
#0 pool_is_node_to_be_sent_in_current_query (node_id=1) at context/pool_query_context.c:253
0000001 0x0000000000458b7a in pool_send_and_wait (query_context=0x108abf0, send_type=-1, node_id=0) at context/pool_query_context.c:629
0000002 0x0000000000445298 in SimpleQuery (frontend=0x0, backend=0x1075c00, len=13, contents=0x1046858 " DISCARD ALL") at protocol/pool_proto_modules.c:570
0000003 0x0000000000438926 in reset_backend (backend=0x1075c00, qcnt=1) at protocol/pool_process_query.c:1403
0000004 0x0000000000434ea3 in pool_process_query (frontend=0x1076be0, backend=0x1075c00, reset_request=1) at protocol/pool_process_query.c:140
0000005 0x000000000042f97b in backend_cleanup (frontend=0x790748, backend=0x1075c00) at protocol/child.c:440
0000006 0x000000000042f36e in do_child (fds=0x10712a0) at protocol/child.c:236
0000007 0x00000000004063eb in fork_a_child (fds=0x10712a0, id=1) at main/pgpool_main.c:678
0000008 0x0000000000405406 in PgpoolMain (discard_status=0 '\000', clear_memcache_oidmaps=0 '\000') at main/pgpool_main.c:279
0000009 0x000000000040413a in main (argc=4, argv=0x7fff0f4aad28) at main/main.c:319
(gdb) list
248 */
249 bool pool_is_node_to_be_sent_in_current_query(int node_id)
250 {
251 POOL_SESSION_CONTEXT *sc;
252
253 if (RAW_MODE)
254 return node_id == REAL_MASTER_NODE_ID;
255
256 sc = pool_get_session_context(true);
257 if (!sc)
(gdb) print *sc
$1 = {process_context = 0x796b40, frontend = 0x1076be0, backend = 0x1075c00, in_progress = 1 '\001', doing_extended_query_message = 1 '\001',
  command_success = 1 '\001', writing_transaction = 0 '\000', failed_transaction = 0 '\000', skip_reading_from_backends = 0 '\000', ignore_till_sync = 0 '\000',
  transaction_isolation = POOL_UNKNOWN, query_context = 0x108abf0, memory_context = 0x1082f30, uncompleted_message = 0x0, message_list = {capacity = 8, size = 2,
    sent_messages = 0x10837b0}, load_balance_node_id = 0, mismatch_ntuples = 0 '\000', ntuples = {0, -1, 0 <repeats 126 times>}, reset_context = 1 '\001',
  query_cache_array = 0x0, num_selects = 0}
(gdb)
256 sc = pool_get_session_context(true);
(gdb)
257 if (!sc)
(gdb) print *sc
$2 = {process_context = 0x796b40, frontend = 0x1076be0, backend = 0x1075c00, in_progress = 1 '\001', doing_extended_query_message = 1 '\001',
  command_success = 1 '\001', writing_transaction = 0 '\000', failed_transaction = 0 '\000', skip_reading_from_backends = 0 '\000', ignore_till_sync = 0 '\000',
  transaction_isolation = POOL_UNKNOWN, query_context = 0x108abf0, memory_context = 0x1082f30, uncompleted_message = 0x0, message_list = {capacity = 8, size = 2,
    sent_messages = 0x10837b0}, load_balance_node_id = 0, mismatch_ntuples = 0 '\000', ntuples = {0, -1, 0 <repeats 126 times>}, reset_context = 1 '\001',
  query
(gdb) n
260 if (pool_is_query_in_progress() && sc->query_context)
(gdb) n
262 return pool_is_node_to_be_sent(sc->query_context, node_id);
(gdb) print *sc->query_context
$3 = {original_query = 0x108b858 " DISCARD ALL", rewritten_query = 0x0, original_length = 13, rewritten_length = -1, parse_tree = 0x108b810,
  rewritten_parse_tree = 0x0, where_to_send = "\001", '\000' <repeats 126 times>, virtual_master_node_id = 0, query_state = {POOL_UNPARSED <repeats 128 times>},
  is_cache_safe = 0 '\000', temp_cache = 0x0, is_multi_statement = 0 '\000', dboid = 0, query_w_hex = 0x0, is_parse_error = 0 '\000', num_original_params = -1,
  memory_context = 0x1083480}
(gdb) s
pool_is_node_to_be_sent (query_context=0x108abf0, node_id=1) at context/pool_query_context.c:235
235 CHECK_QUERY_CONTEXT_IS_VALID;
(gdb) list
230 /*
231 * Return if the DB node is needed to send query
232 */
233 bool pool_is_node_to_be_sent(POOL_QUERY_CONTEXT *query_context, int node_id)
234 {
235 CHECK_QUERY_CONTEXT_IS_VALID;
236
237 if (node_id < 0 || node_id >= MAX_NUM_BACKENDS)
238 ereport(ERROR,
239 (errmsg("checking if db node is needed to be sent, invalid node id:%d",node_id),
(gdb) print node_id
$4 = 1
(gdb) n
242 return query_context->where_to_send[node_id];
(gdb) print query_context->where_to_send[1]
$5 = 0 '\000'
(gdb) print query_context->where_to_send[0]
$6 = 1 '\001'
(gdb)


I think the value of query_context->where_to_send[1] is that it illegally.
The value is I think that should be "1".
Steps To ReproduceStep 1 ---------------------------

Start pgpool-II and PostgreSQL.
Please check the following state.

$ psql -h 192.168.1.3 -p 9999 postgres -c "show pool_nodes";
 node_id | hostname | port | status | lb_weight | role
---------+-------------+------+--------+-----------+---------
 0 | 192.168.1.1 | 5432 | 2 | 0.000000 | primary
 1 | 192.168.1.2 | 5432 | 2 | 1.000000 | standby
(2 rows)

Step2 ---------------------------
Excecute java.

$ java begin_select_sleep


Step3 ---------------------------
Session time-out occurs by client_idle_limit.
Become following state at the time of problems.
 * chiled process continue in use.
   ex. pgpool: kazu testdb 192.168.1.3(35025) DISCARD

 * Client console executing java don't return operation.
   $ java begin_select_sleep
   DBTEST01 START



I attach the configureation file and sample java file for reproduction.
For more information please check the attached files.

pgpool.conf(summary)
-------
    app_name_redirect_preference_list = 'pri_app:primary,stb_app:standby'
    client_idle_limit = 10
-------


sample-java1(summary)
-------
    String url = "jdbc:postgresql://192.168.1.3:9999/testdb?user=kazu&ApplicationName=pri_app&assumeMinServerVersion=9.0";
    conn = DriverManager.getConnection(url);
    stmt = conn.createStatement();
    rs = stmt.executeQuery("SELECT 1");
    Thread.sleep(12000);
--------


sample-java2(summary)
--------
    String url = "jdbc:postgresql://192.168.1.3:9999/testdb?user=kazu&ApplicationName=stb_app&assumeMinServerVersion=9.0";
    conn = DriverManager.getConnection(url);
    stmt = conn.createStatement();
    rs = stmt.executeQuery("SELECT 1");
    Thread.sleep(12000);
--------
TagsNo tags attached.

Activities

uehara

2015-09-09 16:54

reporter  

sample_file.zip (8,357 bytes)

t-ishii

2015-09-17 15:52

developer   ~0000568

Thank you for the report. I'm looking into the problem now...

t-ishii

2015-09-17 17:19

developer   ~0000569

Last edited: 2015-09-17 17:20

View 2 revisions

"DISCARD is sent to only master node" is normal in your case. Pgpool sends any SQL command including DISCARD to only master node because the master node is only used due to app name redirecting setting in your pgpool.conf.
See pool_unset_node_to_be_sent() which is called when "destination = 3" for more details.

So the remaining issue is pgpool child does not complete "DISCARD ALL" command.
I will look into this (I was unable to reproduce the problem by using psql so far).

uehara

2015-10-01 18:12

reporter   ~0000575

Can you please tell me?

I think that reset_query should be sent both node.
This behavior has nothing to do with the contents of app_name_redirect_preference_list.
Is it correct?

"destination = 3" is POOL_BOTH. But reset_query is sent master node only.
So, pgpool child does not complete.

t-ishii

2015-10-02 20:14

developer   ~0000576

Last edited: 2015-10-02 20:20

View 3 revisions

If pgpool-II has not send any query to a standby because pgpool-II chooses the primary node as the "load balancing node" (load_balance_node = 0 in the session context), it's not necessary to send DISCARD to the standby node. Because pgpool-II sends nothing to the standby node.

uehara

2015-10-05 10:57

reporter   ~0000577

thank you for reply.

I could understand reason that reset-query is sent to primary node.

t-ishii

2015-10-16 18:28

developer   ~0000592

Problem found and fix committed.

http://www.pgpool.net/pipermail/pgpool-committers/2015-October/002744.html

Thank you for the report!

Issue History

Date Modified Username Field Change
2015-09-09 16:54 uehara New Issue
2015-09-09 16:54 uehara File Added: sample_file.zip
2015-09-17 11:54 t-ishii Assigned To => t-ishii
2015-09-17 11:54 t-ishii Status new => assigned
2015-09-17 15:52 t-ishii Note Added: 0000568
2015-09-17 17:19 t-ishii Note Added: 0000569
2015-09-17 17:20 t-ishii Note Edited: 0000569 View Revisions
2015-10-01 18:12 uehara Note Added: 0000575
2015-10-02 20:14 t-ishii Note Added: 0000576
2015-10-02 20:20 t-ishii Note Edited: 0000576 View Revisions
2015-10-02 20:20 t-ishii Note Edited: 0000576 View Revisions
2015-10-05 10:57 uehara Note Added: 0000577
2015-10-16 18:28 t-ishii Note Added: 0000592
2015-10-16 18:29 t-ishii Status assigned => resolved