[pgpool-committers: 4153] pgpool: Fix Pgpool-II hung up bug or other errors in error case in exte
ishii at postgresql.org
Thu Jul 27 13:56:50 JST 2017
Fix Pgpool-II hung up bug or other errors in error case in extended query in replication mode.
In extended query in streaming replication mode, responses supposed to
be returned from backend is managed by the pending messages. However,
if an error response returned from backend, the sequence of returned
message from backend is not what we expect. The mismatch was partially
solved in the code but it turned out they were not sufficient.
The cause of hung is basically when an error response is received from
backend before frontend sends 'S' (sync) message. If backend detects
errors while doing extended query, it does not return any response
until it receives sync message. Of course at this point frontend is
expected to send sync message to Pgpool-II, but it may not reach to
the socket of Pgpool-II. So it is possible that Pgpool-II does not
notice the sync message is coming, and does not forward the sync
message to backend. As a result, nothing goes on and Pgpool-II is
stuck. To fix the problem following modifications are made in this
- When error response is received from backend, after forwarding the
error response to frontend, remove all pending messages and backend
message buffer data except POOL_SYNC pending message and ready for
query (before we removed all messages including ready for query,
which is apparently wrong). If sync message is not received yet,
call ProcessFrontendResponse() to read data from frontend. This
ensures eliminating the expectation of receiving messages from backend
in normal cases, and receiving the sync message from frontend.
- When 'S' (sync) message is received from frontend, forward it to
backends and wait till "ready for query" message is received from
the backends. This ensures Pgpool-II to receive the read for query
message and goes into the proper sync point.
- It is still possible after receiving the ready for query message,
different messages arrived from each backend. If the numbers of
messages are same, "kind mismatch" error will occur. If the number
of messages are different, it is possible that Pgpool-II is stuck,
because read_kind_from_backend() will wait till a message coming
from backend. To fix this if either load balance node or primary
node returns 'Z' (ready for query), try to skip messages on the
other node. This is done in read_kind_from_backend(). See comments
around in line 3391 of pool_process_query.c for more details.
Other fixes in this commit.
- Do not send intended error query to backend in streaming replication
mode in ErrorResponse3(). This is not necessary in streaming
- Fix pool_virtual_master_db_node_id() to return the
virtual_master_node_id only when query is in progress and query
context exists. Before in progress state was not checked and may
return bogus node id.
src/context/pool_query_context.c | 2 +-
src/protocol/pool_process_query.c | 19 +++-
src/protocol/pool_proto_modules.c | 178 +++++++++++++++++++++++++++++++-------
3 files changed, 166 insertions(+), 33 deletions(-)
More information about the pgpool-committers