[pgpool-hackers: 4334] Possible dead lock with shared relation cache.

Tatsuo Ishii ishii at sraoss.co.jp
Fri May 12 20:47:47 JST 2023


We have found possible dead lock with shared relation cache and fixed
it in 4.3, 4.2 and 4.1 stable releases.

[From the commit message]

When a user defined function obtains table locking, call to the
function using extended query protocol could lead to dead lock. Here's
a scenario.

(1) In session A client sends parse, bind, execute request for the
    function to pgpol.

(2) Pgpool in session A forwards the request to PostgreSQL.

(3) PostgreSQL in session A performs execute, resulting in a table lock.

(4) In session B client sends parse, bind, execute request for the
    function to pgpool.

(5) Pgpool in session B forwards the request to PostgreSQL.

(6) PostgreSQL in session B performs bind but the table was already
    locked by PostgreSQL in session A, and it waits for release of the
    lock.

(7) After pgpool in session B forwards the execute to PostgreSQL, it
    acquires semaphore to search shared relation cache to check the
    volatility of the function. Then it calls do_query and sends flush
    message to PostgreSQL to obtain the response from PostgreSQL up to
    this point. But since PostgreSQL is waiting for table lock in #6,
    pgpool has to wait for messages beyond bind complete.

(8) After pgpool in session A forwards the execute to PostgreSQL, it
    tries to acquire semaphore to search shared relation cache to check
    the volatility of the function but the semaphore was already
    acquired by pgpool in session B, it waits for the release of the
    semaphore.

(9) Session A and session B wait for each other, resulting in a dead
    lock.

To fix this, modify pool_search_relcache() so that it releases
semaphore before calling do_query() (but acquires semaphore after
do_query()). By this, session A in #8 above can get semaphore, and go
forward. It receives sync message from client and forwards it to
PostgreSQL. Upon receiving sync, the user defined function will finish
the execution and releases the table lock. This makes PostgreSQL in
session B move forward because now it can obtain the table lock.

Back patch to 4.3, 4.2 and 4.1. On these versions semaphore based
shared release cache is used.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list