[pgpool-hackers: 4294] Deadlock Pgpool-II

Tatsuo Ishii ishii at sraoss.co.jp
Wed Mar 22 17:59:50 JST 2023


It has been reported that a deadlock could occur in Pgpool-II shared
relation cache introduced in Pgpool-II 4.1. The precondition to
trigger the deadlock is relatively complicated.

- enable_shared_relcache = on
- There is a user defined function which acquire a table lock
- The function is executed in extended query protocol
- The function is executed in multiple sessions concurrently

There maybe other cases which trigger the deadlock but currently
pg_statsinfo is the only known module which satisfies the condition.

Here is the scenario of the deadlock.

1. A client in session A sends parse/bind/execute message for the the
function to pgpool. In pg_statsinfo, the actual function is
create_snapshot_partition().

2. pgpool in session A forwards the messages to postgres.

3. postgres in session A executes the execute message and a table lock
   is acquired.

4. A client in session B sends parse/bind/execute the function message
to pgpool.

5. pgpool in session B forwards the messages to postgres.

6. postgres in session B executes the bind message and waits for a
table lock since the table is already locked.

7. pgpool in session B forwards the execute message. It acquires the
semaphore to search the shared relation cache to check the volatility
property and sends flush message by calling do_query() to read
responses from postgres. Since postgres in #6 is waiting for the lock,
pgpool is blocked waiting for reply from postgres.

8. pgpool in session A forwards the execute message. Then it tries to
acquire the semaphore to search the shared relation cache but the
semaphore was already acquired in #7 and it is blocked.

9. Session A and session B is waiting for each other (deadlock!)

Attached is the patch trying to solve the issue.

It modifies pool_search_relcache() to temporarily release semaphore
before calling do_query() to give a chance to session A in #8 to go
further so that it receives sync message to forward it to postgres.
postgres will finish the execution of the function which results in
finishing the transaction and releases the table lock.  As a result,
session B in #6 will go forward.

Note that Pgpool-II 4.4 does not use semaphore but uses file locking
to implement shared locking, which is not blocked in #8, thus will not
go into deadlock. Actually we do not see the deadlock if we use
Pgpool-II 4.4.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: relcache.patch
Type: text/x-patch
Size: 1490 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20230322/3cc75d69/attachment.bin>


More information about the pgpool-hackers mailing list