[pgpool-hackers: 3942] verify_backend_node_status does not recognise that the primary is connected to standbys

anirudh.goel at cern.ch anirudh.goel at cern.ch
Fri Jul 2 03:45:19 JST 2021


Hello

I have a setup with 1 primary and 2 async standby postgresql nodes behind 1 pgpool all running on a separate hosts, without watchdog.

I believe I have done all the config required for pgpool to recognise that the standbys are connected to a single primary (for enabling the detach_false_primary feature) but it still doesn’t recognize it.

Before getting into more details, I'll say that I have tried to pcp_detach_node and pcp_attach_node and also pcp_recovery_node in the hope that I was missing some step that would lead to pgpool not connecting to the standby properly but none of those helped.

THE ERROR:
I see this log in pgpool:

2021-07-01 17:07:44: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 0
2021-07-01 17:07:44: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 1
2021-07-01 17:07:44: pid 559: LOG: verify_backend_node_status: primary 2 owns only 0 standbys out of 2
2021-07-01 17:08:34: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 0
2021-07-01 17:08:34: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 1
2021-07-01 17:08:34: pid 559: LOG: verify_backend_node_status: primary 2 owns only 0 standbys out of 2
2021-07-01 17:09:25: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 0
2021-07-01 17:09:25: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 1
2021-07-01 17:09:25: pid 559: LOG: verify_backend_node_status: primary 2 owns only 0 standbys out of 2

Relevant config from pgpool.conf:
# - Backend Connection Settings -

backend_hostname0 = 'dbod-ag-pg01.ch'
backend_data_directory0 = '/ORA/dbs03/AG_PG01/data'
backend_port0 = 6601
backend_weight0 = 1
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'ag_pg01'

backend_hostname1 = 'dbod-ag-pg02.ch'
backend_data_directory1 = '/ORA/dbs03/AG_PG02/data'
backend_port1 = 6604
backend_weight1 = 1
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'ag_pg02'

backend_hostname2 = 'dbod-ag-pg03.ch'
backend_data_directory2 = '/ORA/dbs03/AG_PG03/data'
backend_port2 = 6600
backend_weight2 = 1
backend_flag2 = 'ALLOW_TO_FAILOVER'
backend_application_name2 = 'ag_pg03'

Relevant config from postgresql.conf / myrecovery.conf (application name exists and is same in primary_conninfo as backend_application_name):
primary_conninfo = 'host=dbod-hac-c02.ch port=6600 user=pgrepl application_name=ag_pg01 passfile=''/home/postgres/.pgpass'''

"GRANT pg_monitor TO pgpool;” has been done:
postgres=# \dg
 List of roles
 Role name | Attributes | Member of
-----------------+------------------------------------------------------------+--------------
 admin | Create role, Create DB | {}
 dod_dbmon | | {}
 dod_pmm | Superuser | {}
 pgpool | | {pg_monitor}
 pgpool_recovery | Superuser | {}
 pgrepl | Replication | {}
 postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

Users for sr_check and health_check:
'sr_check_user': 'pgpool',
'sr_check_database': 'postgres',
'health_check_user': 'pgpool',
'health_check_database': 'postgres',

Versions:
PgPool - 4.2.2
PostgreSQL - 12.6

“show pool_nodes and show pool_health_check_stats” shows everything is fine:
postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state
| replication_sync_state | last_status_change
---------+----------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------
+------------------------+---------------------
 0 | dbod-ag-pg01.ch | 6601 | up | 0.333333 | standby | 3 | true | 0 | streaming
| async | 2021-07-01 17:37:42
 1 | dbod-ag-pg02.ch | 6604 | up | 0.333333 | standby | 0 | false | 0 | streaming
| async | 2021-07-01 17:37:42
 2 | dbod-ag-pg03.ch | 6600 | up | 0.333333 | primary | 2 | false | 0 |
| | 2021-07-01 16:51:00
(3 rows)

postgres=# SHOW POOL_HEALTH_CHECK_STATS;
 node_id | hostname | port | status | role | last_status_change | total_count | success_count | fail_count | skip_count | retry
_count | average_retry_count | max_retry_count | max_duration | min_duration | average_duration | last_health_check | last_successful_health_
check | last_skip_health_check | last_failed_health_check
---------+----------------------+------+--------+---------+---------------------+-------------+---------------+------------+------------+------
-------+---------------------+-----------------+--------------+--------------+------------------+---------------------+------------------------
------+------------------------+--------------------------
 0 | dbod-ag-pg01.ch | 6601 | up | standby | 2021-07-01 17:37:42 | 2029 | 1998 | 0 | 31 | 0
 | 0.000000 | 0 | 1033 | 21 | 26.975475 | 2021-07-01 17:38:47 | 2021-07-01 17:38:47
 | 2021-07-01 16:52:00 |
 1 | dbod-ag-pg02.ch | 6604 | up | standby | 2021-07-01 17:37:42 | 2027 | 1850 | 1 | 176 | 5
 | 0.002701 | 5 | 20010 | 23 | 40.249595 | 2021-07-01 17:38:50 | 2021-07-01 17:38:50
 | 2021-07-01 16:55:53 | 2021-07-01 14:05:05
 2 | dbod-ag-pg03.ch | 6600 | up | primary | 2021-07-01 16:51:00 | 2024 | 2010 | 0 | 14 | 0
 | 0.000000 | 0 | 834 | 23 | 54.582090 | 2021-07-01 17:38:52 | 2021-07-01 17:38:52
 | 2021-07-01 16:38:07 |
(3 rows)

If I understand correctly, this is the query that pgpool runs on standbys to understand if they are talking to the current primary and the result of this query seems alright to me:
postgres=# select * from pg_stat_wal_receiver;
 pid | status | receive_start_lsn | receive_start_tli | received_lsn | received_tli | last_msg_send_time | last_msg_receipt_
time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port |
 conninfo

------+-----------+-------------------+-------------------+--------------+--------------+-------------------------------+----------------------
---------+----------------+-------------------------------+-----------+----------------------+-------------+-----------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------
 1651 | streaming | 1/0 | 5 | 1/10009ED8 | 5 | 2021-07-01 19:41:12.659496+02 | 2021-07-01 19:41:12.6
59872+02 | 1/10009ED8 | 2021-07-01 19:41:12.659496+02 | | dbod-hac-c02.ch | 6600 | user=pgrepl passfile=/home/postgre
s/.pgpass dbname=replication host=dbod-hac-c02.ch port=6600 application_name=ag_pg01 fallback_application_name=walreceiver sslmode=prefer
sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any
(1 row)

Cheers!

Anirudh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20210702/61a178a5/attachment.htm>


More information about the pgpool-hackers mailing list