[pgpool-general: 8526] Re: "delay_threshold_by_time" not detecting replication lag

zam bak zam6ak at gmail.com
Wed Dec 28 00:39:38 JST 2022


On Mon, Dec 26, 2022 at 9:23 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>
> >  Hi
> >
> > I just upgraded to 4.4.1 and am noticing that "delay_threshold_by_time"
> > does not seem to be logging or even showing replication lag.
> >
> > I use pgbench with scale_factor=1000 to cause replication delay during the
> > initialization phase.
> > During this phase (which lasts about a minute) I am able to
> > SELECT * FROM pg_stat_replication
> > and see replication lag being 15-20 seconds for duration of initialization
> >
> > However, pgpool log does not show and thresholds reached in the log
> > (despite having log_standby_delay = 'always' ) and "pcp_node_info" and
> > "show pool_nodes" show no lag either
> >
> > if I switch back to "delay_threshold" param only, everything is working as
> > expected and I can see log entries such as:
> >
> > sr_check_worker CONTEXT:  while checking replication time lag
> > sr_check_worker LOG:  Replication of node: 1 is behind 820903040 bytes from
> > the primary server (node: 0)
> >
> > But none of this works when I put "delay_threshold_by_time=1"......
> >
> > 1. Is this a valid bug or could it be my settings?
> > 2. If in fact this is a bug, is it just affecting logging/displaying or the
> > lag and the switching of SELECT queries to primary would still occur if I
> > use this setting and the lag threshold is reached?
>
> delay_threshold_by_time just uses the query result of "int4" column of the following query:
>
> SELECT application_name, state, sync_state,(EXTRACT(EPOCH FROM replay_lag)*1000000)::integer FROM pg_stat_replication;
>  application_name |   state   | sync_state |  int4
> ------------------+-----------+------------+---------
>  server1          | streaming | async      | 1578050
>
> (the result above was taken in my local environment).
>
> If you manually issue the query above against the PostgreSQL primary
> node while executing pgbench -i, what do you see as a result?

I wrote a "capture" script that runs several commands/queries to
demonstrate this issue
Script uses the query you requested (except I added the entire
replay_lag columns also)
The script followed by the output is below and it shows:
* when replay lag was 56+ seconds (according to pg_stat_statements),
the output from SHOW POOL_NODES and from pcp_node_info showed no delay
* there was no entry in the log showing replication log threshold being exceeded

SCRIPT:

[postgres at localhost]$ cat check_lag.sh
#!/usr/bin/env bash

echo -e "\n-"
echo "$(date --iso-8601=seconds) | checking lag on primary directly
via pg_stat_replication..."
psql -w -U postgres -h 10.11.32.15 -p 5432 -c "SELECT
application_name, state, sync_state,(EXTRACT(EPOCH FROM
replay_lag)*1000000)::integer, replay_lag FROM pg_stat_replication;"

echo -e "\n"
echo "$(date --iso-8601=seconds) | checking lag via SHOW POOL_NODES"
psql -U postgres -h localhost -p 9999 -c "SHOW POOL_NODES"

echo -e "\n"
echo "$(date --iso-8601=seconds) | checking lag via pcp_node_info
(against replica)"
pcp_node_info 1 -w -v -h localhost -p 9898 -U postgres

echo -e  "\n"
echo "$(date --iso-8601=seconds) | current contents of
/var/log/pgpool-II/pgpool-01-T
ue.log"
tail -n 20 /var/log/pgpool-II/pgpool-01-Tue.log

echo -e "\n\n\n\n\n"


OUTPUT:

[postgres at localhost]$ ./check_lag.sh


2022-12-27T15:18:55+00:00 | checking lag on primary directly via
pg_stat_replication...
 application_name |   state   | sync_state |   int4   |   replay_lag
------------------+-----------+------------+----------+-----------------
 walreceiver      | streaming | async      | 56663201 | 00:00:56.663201
(1 row)



2022-12-27T15:18:55+00:00 | checking lag via SHOW POOL_NODES
 node_id |  hostname  | port | status | pg_status | lb_weight |  role
 | pg_role | select_cnt | load_balance_node | replication_delay |
replication_state | replication_sync_state | last_status_change
---------+------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | 10.11.32.15 | 5432 | up     | up        | 0.500000  |
primary | primary | 0          | true              | 0
|                   |                        | 2022-12-27 14:44:16
 1       | 10.11.96.15 | 5432 | up     | up        | 0.500000  |
standby | standby | 0          | false             | 0.000000 second
|                   |                        | 2022-12-27 14:44:16
(2 rows)



2022-12-27T15:18:55+00:00 | checking lag via pcp_node_info (against replica)
Hostname               : 10.11.96.15
Port                   : 5432
Status                 : 2
Weight                 : 0.500000
Status Name            : up
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0.000000 second
Replication State      : none
Replication Sync State : none
Last Status Change     : 2022-12-27 14:44:16



2022-12-27T15:18:55+00:00 | current contents of
/var/log/pgpool-II/pgpool-01-Tue.log
2022-12-27 15:18:45.517 [117552]: [111-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  PCP process with pid: 118550 exit with
SUCCESS.
2022-12-27 15:18:45.517 [117552]: [112-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  PCP process with pid: 118550 exits with
status 0
2022-12-27 15:18:48.578 [117347]: [1-1] user=[No Connection],db=[No
Connection],app=child LOG:  new connection received
2022-12-27 15:18:48.578 [117347]: [2-1] user=[No Connection],db=[No
Connection],app=child DETAIL:  connecting host=::1 port=33866
2022-12-27 15:18:48.631 [117347]: [3-1] user=[No Connection],db=[No
Connection],app=psql LOG:  frontend disconnection: session time:
0:00:00.053 user=postgres database=postgres host=::1 port=33866
2022-12-27 15:18:48.636 [117552]: [113-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  forked new pcp worker, pid=118562
socket=7
2022-12-27 15:18:48.651 [117552]: [114-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  PCP process with pid: 118562 exit with
SUCCESS.
2022-12-27 15:18:48.651 [117552]: [115-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  PCP process with pid: 118562 exits with
status 0
2022-12-27 15:18:52.711 [117348]: [1-1] user=[No Connection],db=[No
Connection],app=child LOG:  new connection received
2022-12-27 15:18:52.711 [117348]: [2-1] user=[No Connection],db=[No
Connection],app=child DETAIL:  connecting host=::1 port=33876
2022-12-27 15:18:52.764 [117348]: [3-1] user=[No Connection],db=[No
Connection],app=psql LOG:  frontend disconnection: session time:
0:00:00.053 user=postgres database=postgres host=::1 port=33876
2022-12-27 15:18:52.770 [117552]: [116-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  forked new pcp worker, pid=118572
socket=7
2022-12-27 15:18:52.784 [117552]: [117-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  PCP process with pid: 118572 exit with
SUCCESS.
2022-12-27 15:18:52.784 [117552]: [118-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  PCP process with pid: 118572 exits with
status 0
2022-12-27 15:18:55.927 [117350]: [1-1] user=[No Connection],db=[No
Connection],app=child LOG:  new connection received
2022-12-27 15:18:55.927 [117350]: [2-1] user=[No Connection],db=[No
Connection],app=child DETAIL:  connecting host=::1 port=33884
2022-12-27 15:18:55.979 [117350]: [3-1] user=[No Connection],db=[No
Connection],app=psql LOG:  frontend disconnection: session time:
0:00:00.052 user=postgres database=postgres host=::1 port=33884
2022-12-27 15:18:55.984 [117552]: [119-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  forked new pcp worker, pid=118582
socket=7
2022-12-27 15:18:55.998 [117552]: [120-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  PCP process with pid: 118582 exit with
SUCCESS.
2022-12-27 15:18:55.999 [117552]: [121-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG:  PCP process with pid: 118582 exits with
status 0
2022-12-27 15:19:01.452 [117351]: [1-1] user=[No Connection],db=[No
Connection],app=child LOG:  new connection received
2022-12-27 15:19:01.452 [117351]: [2-1] user=[No Connection],db=[No
Connection],app=child DETAIL:  connecting host=::1 port=36606
2022-12-27 15:19:01.505 [117351]: [3-1] user=[No Connection],db=[No
Connection],app=psql LOG:  frontend disconnection: session time:
0:00:00.053 user=postgres database=postgres host=::1 port=36606


>
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp


More information about the pgpool-general mailing list