[pgpool-general: 9429] Re: Clarification on query results cache visibility

Sat Apr 26 14:36:28 JST 2025

Thank you Tatsuo

On 26/4/25 07:16, Tatsuo Ishii wrote:
> Hi,
>
> Thank you for the detailed report.
>
>> Dear pgpool people
>>
>> Yesterday during some tests which involve :  java app (wildfly) -->
>> pgpool-II 4.6 --> pgsql 17.4 we came to a situation that a new
>> connection could see stale or invalid cache data, meaning that the
>> actual DB contents (5432) were not depicted on certain pgpool
>> connections (9999).
>>
>> The problem exists seemingly only for inherited tables that share the
>> same name, and belong to different schemas.
>>
>> The tables are :
>>
>> amantzio@[local]/dynacom=# \d+ public.useroptions
>>                                                 Table
>> "public.useroptions"
>>   Column  |          Type          | Collation | Nullable | Default |
>> Storage  | Compression | Stats target | Description
>> ----------+------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
>>
>> username | character varying(200) |           | not null |         |
>> extended |             |              |
>> app      | text                   |           | not null |         |
>> extended |             |              |
>> detail   | text                   |           | not null |         |
>> extended |             |              |
>> urlext   | text                   |           | not null |         |
>> extended |             |              |
>> Indexes:
>>     "useroptions_pkey" PRIMARY KEY, btree (username, app, detail)
>> Child tables: bdelosnav.useroptions,
>>               bdynacom.useroptions,
>>               bdynagas.useroptions,
>>               prominencemaritime.useroptions
>> Access method: heap
>>
>> amantzio@[local]/dynacom=#
>>
>> In pgpool we have : memory_cache_enabled = on . All other cache
>> related params are at default.
>>
>> In the specific tests we use access to bdynacom.useroptions, but
>> accessing without fully qualified via search_path. But access of those
>> rows from within public.useroptions also exhibit the problem(s).
>>
>> We have on the actual DB directly :
>>
>> achill at smadevnu:~/workspace/gatewaynet % psql
>> psql (17.4)
>> Type "help" for help.
>>
>> amantzio@[local]/dynacom=# SELECT tableoid::regclass, app, urlext FROM
>> useroptions WHERE username = 'Ioannis Mazarakis' AND app IN
>> ('performreport') AND detail = '';
>>        tableoid       |      app      |   urlext
>> ----------------------+---------------+------------
>> bdynacom.useroptions | performreport | ?zz=foobar
>> (1 row)
>>
>> However querying against pgpool I see a different version of the data:
>>
>> achill at smadevnu:~/workspace/gatewaynet % psql -p 9999
>> psql (17.4)
>> Type "help" for help.
>>
>> amantzio@[local]/dynacom=# SELECT app, urlext FROM useroptions WHERE
>> username = 'Ioannis Mazarakis' AND app IN ('performreport') AND detail
>> = '';
>>       app      |        urlext
>> ---------------+----------------------
>> performreport | ?group=yes&groupno=7
>> (1 row)
>>
>> amantzio@[local]/dynacom=#
>>
>> repeating with including tableoid::regclass or just /*FORCE QUERY
>> CACHE*/ to beat the cache :
>>
>> /*FORCE QUERY CACHE*/ SELECT app, urlext FROM useroptions WHERE
>> username = 'Ioannis Mazarakis' AND app IN ('performreport') AND detail
>> = '';
>>       app      |   urlext
>> ---------------+------------
>> performreport | ?zz=foobar
>> (1 row)
>>
>> Using a another new user does not demo the problem :
>>
>> achill at smadevnu:~/workspace/gatewaynet % psql -p 9999 -U stsoukalas
>> psql (17.4)
>> Type "help" for help.
>>
>> stsoukalas@[local]/dynacom=# SELECT app, urlext FROM useroptions WHERE
>> username = 'Ioannis Mazarakis' AND app IN ('performreport') AND detail
>> = '';
>>       app      |   urlext
>> ---------------+------------
>> performreport | ?zz=foobar
>> (1 row)
>>
>> now logging with another user :
>>
>> achill at smadevnu:~ % psql -p 9999 -U imazarakis dynacom
>> psql (17.4)
>> Type "help" for help.
>>
>> imazarakis@[local]/dynacom=> SELECT app, urlext FROM useroptions WHERE
>> username = 'Ioannis Mazarakis' AND app IN ('performreport') AND detail
>> = '';
>>       app      |   urlext
>> ---------------+------------
>> performreport | ?zz=foobar
>> (1 row)
>>
>> imazarakis@[local]/dynacom=> /*FORCE QUERY CACHE*/  SELECT app, urlext
>> FROM useroptions WHERE username = 'Ioannis Mazarakis' AND app IN
>> ('performreport') AND detail = '';
>>       app      |    urlext
>> ---------------+---------------
>> performreport | ?zz=achillbar
>> (1 row)
>>
>> imazarakis@[local]/dynacom=> update useroptions set urlext=urlext
>> WHERE username = 'Ioannis Mazarakis' AND app IN ('performreport') AND
>> detail = '';
>> UPDATE 1
>> imazarakis@[local]/dynacom=> SELECT app, urlext FROM useroptions WHERE
>> username = 'Ioannis Mazarakis' AND app IN ('performreport') AND detail
>> = '';
>>       app      |    urlext
>> ---------------+---------------
>> performreport | ?zz=achillbar
>> (1 row)
>>
>> As we said, non inherited tables do not seem to suffer from the same
>> issue.
> I think the issue is not related to the table inheritance, but schema
> search path. Since you use the same table name in the queries, you
> should have changed the schema search path so that you can access the
> appropriate table. I can reproduce similar issue just changing search
> path.
>
> create schema s1;
> CREATE SCHEMA
> create schema s2;
> CREATE SCHEMA
> create table s1.t1(i int);
> CREATE TABLE
> create table s2.t1(i int);
> CREATE TABLE
> insert into s1.t1 values(1);
> INSERT 0 1
> insert into s2.t1 values(2);
> INSERT 0 1
> set search_path to s1,public;
> SET
> select * from t1;
>   i
> ---
>   1
> (1 row)
>
> set search_path to s2,public;
> SET
> select * from t1;
>   i
> ---
>   1
> (1 row)
>
> The last select should have returned 2 since it accessed s2.t1, but it
> returned 1 because of the cached row produced by the previous "select
> * from t1" which accessed s1.t1.
in the scenarios that we did and found the problem, the search path was 
constant. The table (effectively bdynacom.useroptions) was accessed 
without qualification (just useroptions). But the insert and the 
selects. This is 100% certain.
>
>> Now granted , either :
>>
>> a) disabling the memory cache  or
>>
>> b) placing a specified comment HINT in front of the query  or
>>
>> c) specifying the affected tables in cache_unsafe_memqcache_table_list
>> or
>>
>> d)or using BEGIN ; SET TRANSACTION ISOLATION LEVEL REPEATABLE READ  ,
>> or above
>>
>> as you explain here :
>> https://pgsqlpgpool.blogspot.com/2021/04/visibility-with-query-cache.html
> Sorry but I don't understand why "BEGIN ; SET TRANSACTION ISOLATION
> LEVEL REPEATABLE READ  ," is useful for your problem.
>
>> (btw, why non inherited tables work so nice with the cache ??? without
>> the issue above?)
> I don't think inheritance is relevant. See above.
>
>> If the blog is still valid, why not state this explicitly inside the
>> docs?
> I think the blog is still valid but it's not connected to your
> problem.
>
>> And if the blog is no longer relevant , please include such a
>> statement.
>>
>> So the above could help alleviate the problem ... however:
>>
>> a) disabling the cache ... is not ideal , I mean why not have such a
>> powerful feature ?
>>
>> b) placing comments in 1000s of queries is not an option
>>
>> c) we would not like to restrict tables based on a bug
>>
>> d) same as b) rewriting all our transactions is not an option , also
>> we would not like to change the default default_transaction_isolation.
>>
>> So can you please explain what is the state of affairs regarding
>> pgpool query cache and inherited tables? or pgpool caching and
>> visibility ?
> As stated above, the query cache does not take account of schema
> search path. I think that's source of the problem. Probably in
> addition to user name, database and bind parameters (if any), the
> schema search path should be added to the data when pgpool calculates
> a MD5 hash (which is the cache key) on it. However this requires
> pgpool to obtain the search path using "show search_path", which adds
> an additional overhead while fetching query cache. Any idea?

regarding the MD5 calculation, instead of the schema why not the 
effective oid of the actual table?

Lets stay in the inheritance example :

We query :

1) public.useroptions and the row belongs to the parent table 
(public.useroptions) : we use the  public.useroptions::regclass

2) public.useroptions and the row belongs to the child table 
(bdynacom.useroptions) : we still use the table queried oid : 
public.useroptions::regclass

3) bdynacom.useroptions (via search path or fully qualified) : we use 
bdynacom.useroptions::regclass

>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp