[pgpool-committers: 4153] pgpool: Fix Pgpool-II hung up bug or other errors in error case in exte

Thu Jul 27 13:56:50 JST 2017

Fix Pgpool-II hung up bug or other errors in error case in extended query in replication mode.

In extended query in streaming replication mode, responses supposed to
be returned from backend is managed by the pending messages. However,
if an error response returned from backend, the sequence of returned
message from backend is not what we expect. The mismatch was partially
solved in the code but it turned out they were not sufficient.

The cause of hung is basically when an error response is received from
backend before frontend sends 'S' (sync) message. If backend detects
errors while doing extended query, it does not return any response
until it receives sync message. Of course at this point frontend is
expected to send sync message to Pgpool-II, but it may not reach to
the socket of Pgpool-II. So it is possible that Pgpool-II does not
notice the sync message is coming, and does not forward the sync
message to backend. As a result, nothing goes on and Pgpool-II is
stuck. To fix the problem following modifications are made in this
commit:

- When error response is received from backend, after forwarding the
  error response to frontend, remove all pending messages and backend
  message buffer data except POOL_SYNC pending message and ready for
  query (before we removed all messages including ready for query,
  which is apparently wrong).  If sync message is not received yet,
  call ProcessFrontendResponse() to read data from frontend. This
  ensures eliminating the expectation of receiving messages from backend
  in normal cases, and receiving the sync message from frontend.

- When 'S' (sync) message is received from frontend, forward it to
  backends and wait till "ready for query" message is received from
  the backends. This ensures Pgpool-II to receive the read for query
  message and goes into the proper sync point.

- It is still possible after receiving the ready for query message,
  different messages arrived from each backend. If the numbers of
  messages are same, "kind mismatch" error will occur. If the number
  of messages are different, it is possible that Pgpool-II is stuck,
  because read_kind_from_backend() will wait till a message coming
  from backend. To fix this if either load balance node or primary
  node returns 'Z' (ready for query), try to skip messages on the
  other node. This is done in read_kind_from_backend(). See comments
  around in line 3391 of pool_process_query.c for more details.

Other fixes in this commit.

- Do not send intended error query to backend in streaming replication
  mode in ErrorResponse3(). This is not necessary in streaming
  replication mode.

- Fix pool_virtual_master_db_node_id() to return the
  virtual_master_node_id only when query is in progress and query
  context exists. Before in progress state was not checked and may
  return bogus node id.

Branch
------
master

Details
-------
https://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=8640abfc41ff06b1e6d31315239292f4d3d4191d

Modified Files
--------------
src/context/pool_query_context.c  |   2 +-
src/protocol/pool_process_query.c |  19 +++-
src/protocol/pool_proto_modules.c | 178 +++++++++++++++++++++++++++++++-------
3 files changed, 166 insertions(+), 33 deletions(-)