[pgpool-hackers: 3943] Re: verify_backend_node_status does not recognise that the primary is connected to standbys

Tatsuo Ishii ishii at sraoss.co.jp
Fri Jul 2 10:47:41 JST 2021


> Hello
> 
> I have a setup with 1 primary and 2 async standby postgresql nodes behind 1 pgpool all running on a separate hosts, without watchdog.
> 
> I believe I have done all the config required for pgpool to recognise that the standbys are connected to a single primary (for enabling the detach_false_primary feature) but it still doesn’t recognize it.
> 
> Before getting into more details, I'll say that I have tried to pcp_detach_node and pcp_attach_node and also pcp_recovery_node in the hope that I was missing some step that would lead to pgpool not connecting to the standby properly but none of those helped.
> 
> THE ERROR:
> I see this log in pgpool:
> 
> 2021-07-01 17:07:44: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 0
> 2021-07-01 17:07:44: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 1
> 2021-07-01 17:07:44: pid 559: LOG: verify_backend_node_status: primary 2 owns only 0 standbys out of 2
> 2021-07-01 17:08:34: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 0
> 2021-07-01 17:08:34: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 1
> 2021-07-01 17:08:34: pid 559: LOG: verify_backend_node_status: primary 2 owns only 0 standbys out of 2
> 2021-07-01 17:09:25: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 0
> 2021-07-01 17:09:25: pid 559: LOG: verify_backend_node_status: primary 2 does not connect to standby 1
> 2021-07-01 17:09:25: pid 559: LOG: verify_backend_node_status: primary 2 owns only 0 standbys out of 2

I think the reason you are seeing these errors is that pgpool tries to
compare backend_hostname2 (dbod-ag-pg03.ch) and backend_port2 (6600)
with host and port (dbod-hac-c02.ch, 6600) in primary_conninfo. In
this case port matches but hostname does not match and Pgpool-II
regards that the standbys do not connect to the primary node,.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Relevant config from pgpool.conf:
> # - Backend Connection Settings -
> 
> backend_hostname0 = 'dbod-ag-pg01.ch'
> backend_data_directory0 = '/ORA/dbs03/AG_PG01/data'
> backend_port0 = 6601
> backend_weight0 = 1
> backend_flag0 = 'ALLOW_TO_FAILOVER'
> backend_application_name0 = 'ag_pg01'
> 
> backend_hostname1 = 'dbod-ag-pg02.ch'
> backend_data_directory1 = '/ORA/dbs03/AG_PG02/data'
> backend_port1 = 6604
> backend_weight1 = 1
> backend_flag1 = 'ALLOW_TO_FAILOVER'
> backend_application_name1 = 'ag_pg02'
> 
> backend_hostname2 = 'dbod-ag-pg03.ch'
> backend_data_directory2 = '/ORA/dbs03/AG_PG03/data'
> backend_port2 = 6600
> backend_weight2 = 1
> backend_flag2 = 'ALLOW_TO_FAILOVER'
> backend_application_name2 = 'ag_pg03'
> 
> Relevant config from postgresql.conf / myrecovery.conf (application name exists and is same in primary_conninfo as backend_application_name):
> primary_conninfo = 'host=dbod-hac-c02.ch port=6600 user=pgrepl application_name=ag_pg01 passfile=''/home/postgres/.pgpass'''
> 
> "GRANT pg_monitor TO pgpool;” has been done:
> postgres=# \dg
>  List of roles
>  Role name | Attributes | Member of
> -----------------+------------------------------------------------------------+--------------
>  admin | Create role, Create DB | {}
>  dod_dbmon | | {}
>  dod_pmm | Superuser | {}
>  pgpool | | {pg_monitor}
>  pgpool_recovery | Superuser | {}
>  pgrepl | Replication | {}
>  postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
> 
> Users for sr_check and health_check:
> 'sr_check_user': 'pgpool',
> 'sr_check_database': 'postgres',
> 'health_check_user': 'pgpool',
> 'health_check_database': 'postgres',
> 
> Versions:
> PgPool - 4.2.2
> PostgreSQL - 12.6
> 
> “show pool_nodes and show pool_health_check_stats” shows everything is fine:
> postgres=# show pool_nodes;
>  node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state
> | replication_sync_state | last_status_change
> ---------+----------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------
> +------------------------+---------------------
>  0 | dbod-ag-pg01.ch | 6601 | up | 0.333333 | standby | 3 | true | 0 | streaming
> | async | 2021-07-01 17:37:42
>  1 | dbod-ag-pg02.ch | 6604 | up | 0.333333 | standby | 0 | false | 0 | streaming
> | async | 2021-07-01 17:37:42
>  2 | dbod-ag-pg03.ch | 6600 | up | 0.333333 | primary | 2 | false | 0 |
> | | 2021-07-01 16:51:00
> (3 rows)
> 
> postgres=# SHOW POOL_HEALTH_CHECK_STATS;
>  node_id | hostname | port | status | role | last_status_change | total_count | success_count | fail_count | skip_count | retry
> _count | average_retry_count | max_retry_count | max_duration | min_duration | average_duration | last_health_check | last_successful_health_
> check | last_skip_health_check | last_failed_health_check
> ---------+----------------------+------+--------+---------+---------------------+-------------+---------------+------------+------------+------
> -------+---------------------+-----------------+--------------+--------------+------------------+---------------------+------------------------
> ------+------------------------+--------------------------
>  0 | dbod-ag-pg01.ch | 6601 | up | standby | 2021-07-01 17:37:42 | 2029 | 1998 | 0 | 31 | 0
>  | 0.000000 | 0 | 1033 | 21 | 26.975475 | 2021-07-01 17:38:47 | 2021-07-01 17:38:47
>  | 2021-07-01 16:52:00 |
>  1 | dbod-ag-pg02.ch | 6604 | up | standby | 2021-07-01 17:37:42 | 2027 | 1850 | 1 | 176 | 5
>  | 0.002701 | 5 | 20010 | 23 | 40.249595 | 2021-07-01 17:38:50 | 2021-07-01 17:38:50
>  | 2021-07-01 16:55:53 | 2021-07-01 14:05:05
>  2 | dbod-ag-pg03.ch | 6600 | up | primary | 2021-07-01 16:51:00 | 2024 | 2010 | 0 | 14 | 0
>  | 0.000000 | 0 | 834 | 23 | 54.582090 | 2021-07-01 17:38:52 | 2021-07-01 17:38:52
>  | 2021-07-01 16:38:07 |
> (3 rows)
> 
> If I understand correctly, this is the query that pgpool runs on standbys to understand if they are talking to the current primary and the result of this query seems alright to me:
> postgres=# select * from pg_stat_wal_receiver;
>  pid | status | receive_start_lsn | receive_start_tli | received_lsn | received_tli | last_msg_send_time | last_msg_receipt_
> time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port |
>  conninfo
> 
> ------+-----------+-------------------+-------------------+--------------+--------------+-------------------------------+----------------------
> ---------+----------------+-------------------------------+-----------+----------------------+-------------+-----------------------------------
> -----------------------------------------------------------------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------
>  1651 | streaming | 1/0 | 5 | 1/10009ED8 | 5 | 2021-07-01 19:41:12.659496+02 | 2021-07-01 19:41:12.6
> 59872+02 | 1/10009ED8 | 2021-07-01 19:41:12.659496+02 | | dbod-hac-c02.ch | 6600 | user=pgrepl passfile=/home/postgre
> s/.pgpass dbname=replication host=dbod-hac-c02.ch port=6600 application_name=ag_pg01 fallback_application_name=walreceiver sslmode=prefer
> sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any
> (1 row)
> 
> Cheers!
> 
> Anirudh


More information about the pgpool-hackers mailing list