0000469: LOG ERROR: primary 0 owns only 0 standbys out of 1 - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000469	Pgpool-II	General	public	2019-02-23 07:05	2019-03-21 19:51

Reporter	demisx	Assigned To	t-ishii
Priority	high	Severity	major	Reproducibility	always
Status	resolved	Resolution	open
Platform	Linux	OS	Ubuntu	OS Version	18.04 LTS
Product Version	4.0.2

Summary	0000469: LOG ERROR: primary 0 owns only 0 standbys out of 1
Description	I have pgpool-II running on app server managing two database nodes (0 - primary and 1 - standby). The psql -c 'show pool_nodes' command shows both nodes being up: ``` $ psql -c 'show pool_nodes' node_id \| hostname \| port \| status \| lb_weight \| role \| select_cnt \| load_balance_node \| replication_delay \| last_status_change ---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------+--------------------- 0 \| 10.1.1.20 \| 5432 \| up \| 0.500000 \| primary \| 0 \| true \| 0 \| 2019-02-22 21:37:55 1 \| 10.1.3.20 \| 5432 \| up \| 0.500000 \| standby \| 0 \| false \| 0 \| 2019-02-22 21:44:49 ``` However, I've started seeing this error today written to the log every 5 sec after testing online recovery and fallback processes: ``` 2019-02-22 22:04:04: pid 21998: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 ``` I can't figure out what may be wrong. Any idea where I can look?
Tags	error, logs

demisx 2019-02-23 07:20 reporter ~0002398	Here is the full log. The error is written every 10 seconds actually and I believe it's coming from sr_check_period=10 setting. The PostgreSQL streaming replication appears to work properly. That is when I create and populate a test table on primary, I can see the same happening on standby. ``` $ sudo -u postgres pgpool -n --- 2019-02-22 22:15:47: pid 22204: LOG: Setting up socket for 0.0.0.0:5432 2019-02-22 22:15:47: pid 22204: LOG: Setting up socket for :::5432 2019-02-22 22:15:47: pid 22204: LOG: find_primary_node_repeatedly: waiting for finding a primary node 2019-02-22 22:15:47: pid 22204: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:15:47: pid 22204: LOG: find_primary_node: primary node is 0 2019-02-22 22:15:47: pid 22204: LOG: find_primary_node: standby node is 1 2019-02-22 22:15:47: pid 22204: LOG: pgpool-II successfully started. version 4.0.2 (torokiboshi) 2019-02-22 22:15:47: pid 22204: LOG: node status[0]: 1 2019-02-22 22:15:47: pid 22204: LOG: node status[1]: 2 2019-02-22 22:15:47: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:15:57: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:07: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:17: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:27: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:37: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 2019-02-22 22:16:47: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1 ```

demisx 2019-02-24 00:37 reporter ~0002399	I've narrowed it down a little more. It appears that though my backend node is running in streaming replication, the following query returns NULL: ``` 2019-02-23 15:32:32: pid 17383: DEBUG: verify_backend_node_status: pg_stat_wal_receiver status for standby 1 is NULL 2019-02-23 15:32:32: pid 17383: LOCATION: pgpool_main.c:3271 ``` If I query the same standby node via pgpool, I get the status back: ``` $ psql -x -h 10.1.3.20 -c "SELECT status, conninfo FROM pg_stat_wal_receiver"; -[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- status \| streaming conninfo \| user=repl passfile=/var/lib/postgresql/.pgpass dbname=replication host=10.1.1.20 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 krbsrvname=postgres target_session_attrs=any ``` Any idea what I may be missing here?

demisx 2019-02-24 03:34 reporter ~0002400	OK. I was finally able to fix this. The problem was in sr_check_user= 'pgpool' not having permissions to query `pg_stat_wal_receiver` view. It became apparent after passing the pgpool username in the SELECT: ``` $ psql -x -h [standby.db.node.ip] -U pgpool -d postgres -c "SELECT status, conninfo FROM pg_stat_wal_receiver"; ``` The issue could be fixed with making pgpool user part of "pg_monitor` role: ``` GRANT pg_monitor to pgpool; ``` This step seems to be missing from http://www.pgpool.net/docs/latest/en/html/example-cluster.html docs. Would be nice to include it. Also, please consider moving your code to GitHub or GitLab. This way we can all help to contribute such fixes with pull requests. This issue can be closed.

t-ishii 2019-02-24 21:07 developer ~0002403	Thanks for the report. Unfortunately the solution is only available in PostgreSQL 10 or later (9.6 does not have pg_monitor role). Anyway your point is valid. I will update the docs. > consider moving your code to GitHub or GitLab. This way we can all help to contribute such fixes with pull requests. Sorry, we don't have the plan. You could still send us patches to us using our git repository (like PostgreSQL development).

Date Modified	Username	Field	Change
2019-02-23 07:05	demisx	New Issue
2019-02-23 07:05	demisx	Tag Attached: error
2019-02-23 07:05	demisx	Tag Attached: logs
2019-02-23 07:20	demisx	Note Added: 0002398
2019-02-24 00:37	demisx	Note Added: 0002399
2019-02-24 03:34	demisx	Note Added: 0002400
2019-02-24 21:03	t-ishii	Assigned To	=> t-ishii
2019-02-24 21:03	t-ishii	Status	new => assigned
2019-02-24 21:07	t-ishii	Note Added: 0002403
2019-03-21 19:51	t-ishii	Status	assigned => resolved