[pgpool-general: 8325] Re: Timestamp cast not cached

Mon Jul 11 17:39:11 JST 2022

Thank you for the thought. However I have no clear idea what you
suggest here to enhance pgpool in terms of query cache. Can you
clarify more?

> Hi after a consultation with some pg expert that what they answer to me.
> Maybe it could help you with your fix.
> 
> Hi  Avi Raboah,
> 
> To expand on Andrew’s answer, we need to explore the difference between
> IMMUTABLE, STABLE and VOLATILE declared function types in PostgreSQL
> 
> IMMUTABLE functions must adhere to three requirements:
> # Not change the database during execution.
> # Not execute any database lookups. Can only use values passed as arguments.
> # Always return the same value when supplied the same arguments.
> 
> STABLE functions must adhere to two requirements:
> # Not change the database during execution.
> # Always return the same values when supplied the same arguments during a
> single table scan.  However, the return values can change during execution
> of later SQL statements within a transaction.
> 
> VOLATILE functions have no ridge requirements, and it is the default type
> for functions.
> 
> What does the above mean? PostgreSQL uses the three function types to
> decide on optimization paths to improve performance.
> 
> Immutable functions offer the most optimization options as PostgreSQL can
> inline and cache the results of a function call during execution. Example:
> 
> <pre><code class="sql">
> CREATE TEMPORARY TABLE test (id integer, tt text);
> 
> INSERT INTO test (SELECT (random() * 10)::integer, random()::text FROM
> GENERATE_SERIES(0,999));
> 
> EXPLAIN SELECT true  FROM test  WHERE id = 2+2 ;
> --returns
> --Seq Scan on test  (cost=0.00..19.50 rows=88 width=1)
> --  Filter: (id = 4)
> </code></pre>
> 
> We can see the FILTER condition was executed in-lining the filtering
> condition on the value 4, not on 2 + 2. PostgreSQL rewrote the query to use
> a static value of 4.
> 
> <pre><code class="sql">
> SELECT true  FROM test  WHERE id = 4
> </code></pre>
> 
> PostgreSQL can do this as the + operator is marked as immutable. This saved
> executing the + operator an additional 999 times when comparing id to 2+2.
> 
> Stable functions have similar properties to immutable functions, which
> allows PostgreSQL to cache and inline results while executing a table scan
> to avoid executing costly functions multiple times when scanning thousands
> to millions of rows. However, a stable function must be executed again
> later in a transaction. Example:
> 
> <pre><code class="sql">
> BEGIN;
> 
> SELECT statement_timestamp(), now();
> SELECT pg_sleep(2);
> SELECT statement_timestamp(), now();
> 
> ROLLBACK;
> </code></pre>
> 
> 
> With a volatile function, PostgreSQL cannot make optimization decisions and
> must execute the function every time it is called. This makes volatile
> functions very expensive if used in a table scan, for example.
> 
> 
> <pre><code class="sql">
> SELECT timeofday() as volatile_time, NOW() stable_time_during_transaction
> FROM generate_series(0,999)
> </code></pre>
> 
> This creates a thousand row result set where  TIMEOFDAY () is changing
> while NOW () is stable and does not change during a transaction.
> 
> PostgreSQL functions marked as stable and immutable will always return that
> same value when the same arguments are supplied. They’re not going to be
> volatile, returning an unexpected value.
> 
> The three function types Immutable, Stable, Volatile are used by the
> planner to decide which execution plan to create to improve performance.
> 
> When would an IMMUTABLE or STABLE function become unstable? An immutable
> function will not become unstable as long as the same arguments are
> supplied. A stable function can become unstable if the function does
> database lookups that changes its behavior. This is the reason stable
> functions have to be executed in later SQL commands in a transaction, while
> IMMUTABLE functions do not.
> 
> How do we know which function PostgreSQL chooses to execute/use? PostgreSQL
> will search through the catalog for matching function name, number of
> arguments, argument types, if arguments have default values and will order
> the functions by the session’s SEARCH_PATH. PostgreSQL will use the first
> function it finds that matches all the above requirements.
> 
> Leaving out the precision for casting operation, PostgreSQL will match to a
> different function as the arguments can change as TIMESTAMP can have
> variable precision, making the output unstable. This limits optimization
> paths and will cause PostgreSQL to execute the function more times during a
> table scan vs the immutable version of the function.
> 
> This leads us to Andrew’s statement about sanitizing the text input for
> security reasons to protect against SQL injection and prevent PostgreSQL
> from executing IMMUTABLE and STABLE functions multiple times as the input
> will be variable.
> 
> In summary, the immutable, stable, and volatile type functions are used by
> the planner to decide optimization paths; it does not guarantee/guard
> unexpected behavior. However, if the arguments passed to the function are
> consistent, the results will be immutable for immutable functions. Stable
> functions depend on the specific function. Volatile functions should be
> assumed to be unstable.
> 
> Thanks,
> 
> Avi.
> 
> 
> On Thu, 7 Jul 2022 at 15:06 Avi Raboah <avi.raboah at gmail.com> wrote:
> 
>> Great, thanks for the update and your efforts.
>>
>> really appreciate it
>>
>> Avi.
>>
>>
>> On Thu, 7 Jul 2022 at 4:53 Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>
>>> > I have posted a question to pgsql-hackers regarding to_timestamp.
>>> >
>>> https://www.postgresql.org/message-id/flat/20220705.172957.2068967435108479827.t-ishii%40sranhm.sra.co.jp
>>> >
>>> > It turned out that the provolatile value for the two forms of
>>> > to_timestamp are correct.  Surely to_timestamp(1 argument) returns
>>> > different result depending on the time zone but the actual internal
>>> > value of timestamptz is identical. The output difference is merely how
>>> > the function prints timestamptz according to the time zone.
>>> >
>>> > Here are examples provided by Tom Lane.
>>> >
>>> > regression=# show timezone;
>>> >      TimeZone
>>> > ------------------
>>> >  America/New_York
>>> > (1 row)
>>> >
>>> > regression=# select to_timestamp(0);
>>> >       to_timestamp
>>> > ------------------------
>>> >  1969-12-31 19:00:00-05
>>> > (1 row)
>>> >
>>> > regression=# set timezone = 'utc';
>>> > SET
>>> > regression=# select to_timestamp(0);
>>> >       to_timestamp
>>> > ------------------------
>>> >  1970-01-01 00:00:00+00
>>> > (1 row)
>>> >
>>> > "1969-12-31 19:00:00-05" and "1970-01-01 00:00:00+00" are actually
>>> > same value as timestamptz data type, which means labeling this form of
>>> > to_timestamp as immutable is correct.
>>> >
>>> > However this does not solve the problem of query cache in Pgpool-II.
>>> >
>>> > The disccusion above suggests that even if the function is labeled as
>>> > immutable, there are cases when pgpool should not create a cache for
>>> > the SELECT which uses the function (you already showed a good
>>> > example). I thinkk pgpool should not create a cache if a function is
>>> > labeled other than immutable (we already do it) and if its return type
>>> > is timestamptz or timetz (we have not done it yet).
>>> >
>>> > But there's more.
>>> >
>>> > PostgreSQL has similar config settings that change the output style of
>>> > functions/expressions: namely datestyle and client_encoding etc.
>>> >
>>> > test=# show datestyle;
>>> >  DateStyle
>>> > -----------
>>> >  ISO, MDY
>>> > (1 row)
>>> >
>>> > test=# select '2022-07-06'::date;
>>> >     date
>>> > ------------
>>> >  2022-07-06
>>> > (1 row)
>>> >
>>> > test=# set datestyle to 'Postgres, mdy';
>>> > SET
>>> > test=# select '2022-07-06'::date;
>>> >     date
>>> > ------------
>>> >  07-06-2022
>>> > (1 row)
>>> >
>>> > If we execute these using pgpool with query cache enabled, we get:
>>> >
>>> > test=# show datestyle;
>>> >  DateStyle
>>> > -----------
>>> >  ISO, MDY
>>> > (1 row)
>>> >
>>> > test=# select '2022-07-06'::date;
>>> >     date
>>> > ------------
>>> >  2022-07-06
>>> > (1 row)
>>> >
>>> > test=# set datestyle to 'Postgres, mdy';
>>> > SET
>>> > test=# select '2022-07-06'::date;
>>> >     date
>>> > ------------
>>> >  2022-07-06
>>> > (1 row)
>>> >
>>> > Notice the last result does reflect the datestyle setting change
>>> > because of query cache.
>>> >
>>> > I think there are more config parameters that induce the wrong
>>> > behavior of pgpool:
>>> >
>>> > IntervalStyle, extra_float_digits, lc_messages, lc_monetary,
>>> lc_numeric, lc_time
>>> >
>>> > For now I don't know how to deal with the problem. Maybe we should
>>> > just add this as a restriction to the doc?
>>>
>>> I have pushed commits to partially solve the problem.
>>>
>>> https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=47fd46cbcf31b46d93a8ade62fda43b82c39b4e6
>>>
>>> With the commit, SELECTs having functions with return type is
>>> timestamptz or timetz are not cached anymore. This will avoid the
>>> problem due to the time zone setting difference.  Note that the fix
>>> has been only applied to Pgpool-II 4.2 or later. Pgpool-II 4.1 an
>>> before are lacking necessary infrastructures to support the commit.
>>>
>>> Remaining problems are:
>>> > IntervalStyle, extra_float_digits, lc_messages, lc_monetary,
>>> lc_numeric, lc_time
>>>
>>> I think we have to add these to the restriction section of Pgpool-II
>>> doc. I am going work on this later.
>>>
>>> Best reagards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese:http://www.sraoss.co.jp
>>>
>>