[pgpool-hackers: 4309] Re: Occasional 005.jdbc test failure

Thu Apr 13 16:41:38 JST 2023

>>> We occasionaly see 005.jdbc test failure. Typical error is something
>>> like this:
>>> 
>>> 2023-02-22 08:51:47.704: PostgreSQL JDBC Driver pid 12420: LOG:  DB node id: 0 backend pid: 12488 statement: Parse: COMMIT
>>> 2023-02-22 08:51:47.705: PostgreSQL JDBC Driver pid 12420: LOG:  pool_send_and_wait: Error or notice message from backend: : DB node id: 0 backend pid: 12488 statement: "COMMIT" message: "prepared statement "S_1" already exists"
>>> 2023-02-22 08:51:47.705: PostgreSQL JDBC Driver pid 12420: LOG:  Parse: Error or notice message from backend: : DB node id: 0 backend pid: 12488 statement: "COMMIT" message: "prepared statement "S_1" already exists"
>>> 
>>> I found possible cause of the error two seconds ago:
>>> 
>>> 2023-02-22 08:51:45.241: PostgreSQL JDBC Driver pid 12420: LOG:  Sync message from frontend.
>>> 2023-02-22 08:51:45.241: PostgreSQL JDBC Driver pid 12420: LOG:  Terminate message from frontend.
>>> 2023-02-22 08:51:45.241: PostgreSQL JDBC Driver pid 12420: LOG:  DB node id: 0 backend pid: 12488 statement: DISCARD ALL
>>> 2023-02-22 08:51:45.242: PostgreSQL JDBC Driver pid 12420: LOG:  pool_send_and_wait: Error or notice message from backend: : DB node id: 0 backend pid: 12488 statement: "DISCARD ALL" message: "DISCARD ALL cannot be executed within a pipeline"
>>> 
>>> "DISCARD ALL" was generated by pgpool (reset_query_list) to discard
>>> some objects including prepared statements created in the
>>> session. Since DISCARD ALL failed, the prepared statement S_1 was not
>>> removed. Thus the next session failed because S_1 already existed.
>>> 
>>> Question is why "DISCARD ALL cannot be executed within a pipeline"
>>> error?
>>> 
>>> So I consult PostgreSQL document:
>>> https://www.postgresql.org/docs/15/protocol-flow.html#PROTOCOL-FLOW-PIPELINING
>>> 
>>> "However, there are a few DDL commands (such as CREATE DATABASE) that
>>> cannot be executed inside a transaction block. If one of these is
>>> executed in a pipeline, it will fail unless it is the first command in
>>> the pipeline."
>>> 
>>> "DISCARD ALL" is on the same boat as CREATE DATABASE (i.e. cannot be
>>> executed inside a transaction block). It seems we have to wait for
>>> pipeline being closed before issuing DISCARD ALL. How? Our PostgreSQL
>>> doc says:
>>> 
>>> "When using this method, completion of the pipeline must be determined
>>> by counting ReadyForQuery messages and waiting for that to reach the
>>> number of Syncs sent."
>>> 
>>> This means that we should not issue DISCARD ALL before receiving at
>>> least one ReadyForQuery from backend.
>>> 
>>> Waiting for ReadyForQuery is not a big deal. But pgpool cannot
>>> unconditionaly wait for ReadyForQuery. What if ReadyForQuery was
>>> already sent? I am going to study this issue.
>> 
>> After studying the issue I came to a conclusion that there's no
>> reliable way to wait or not wait for ReadyForQuery in a reliable
>> way. So attached is a patch trying to attack the issue in completely
>> different way: instead of reading ReadyForQuery, discard the
>> connection pool if pipleline or any other error occurred.  For this
>> purpose a global variable "reset_query_error" is introduced. In the
>> reset query loop, SimpleQuery() is called, then it calls
>> pool_send_and_wait(), and it calls
>> per_node_error_log(). per_node_error_log() checks whether backend
>> returns error response. If so, reset_query_error" is set to true in
>> per_node_error_log(). If it is set to true, backend_cleanup() discards
>> the connection pool. This is a little bit ugly but I cannot think of
>> better implementation at this point.
>> 
>> Comments/suggestions are welcome.
> 
> I have pushed slightly modified version of the patch. For now I pushed
> only to master branch. If it works, I will back patch to all supported
> branches.

Since the commit (pushed to master branch at 2023/4/6: 12:43 JST):
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=91f04da7e8c4eed3d86b4449a6dc9e67c9ad598a

I haven't seen any 005.jdbc failure for 5 days on master branch
according to the build-farm reports:

https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002678.html
https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002679.html
https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002680.html
https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002681.html
https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002682.html
https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002683.html
https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002684.html
https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002685.html
https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002686.html
https://www.pgpool.net/pipermail/pgpool-buildfarm/2023-April/002687.html

So I decided to push to all supported branches.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp